From 4a63533216f68d5f05b5f99b2d26cb44a1666494 Mon Sep 17 00:00:00 2001 From: Apurv Agarwal Date: Tue, 8 Aug 2023 22:40:11 +0530 Subject: [PATCH] addition to docs at 'Store and reference chat history' (#8910) - Description: I have added an example showing how to pass a custom template to ConversationRetrievalChain. Instead of CONDENSE_QUESTION_PROMPT we can pass any prompt in the argument condense_question_prompt. Look in Use cases -> QA over Documents -> How to -> Store and reference chat history, - Issue: #8864, - Dependencies: NA, - Tag maintainer: @hinthornw, - Twitter handle: --------- Co-authored-by: Bagatur --- .../modules/chains/popular/chat_vector_db.mdx | 101 +++++++++++------- 1 file changed, 63 insertions(+), 38 deletions(-) diff --git a/docs/snippets/modules/chains/popular/chat_vector_db.mdx b/docs/snippets/modules/chains/popular/chat_vector_db.mdx index 66dfc6602b9..c68ada0d5a5 100644 --- a/docs/snippets/modules/chains/popular/chat_vector_db.mdx +++ b/docs/snippets/modules/chains/popular/chat_vector_db.mdx @@ -8,7 +8,6 @@ from langchain.chains import ConversationalRetrievalChain Load in documents. You can replace this with a loader for whatever type of data you want - ```python from langchain.document_loaders import TextLoader loader = TextLoader("../../state_of_the_union.txt") @@ -17,7 +16,6 @@ documents = loader.load() If you had multiple loaders that you wanted to combine, you do something like: - ```python # loaders = [....] # docs = [] @@ -27,7 +25,6 @@ If you had multiple loaders that you wanted to combine, you do something like: We now split the documents, create embeddings for them, and put them in a vectorstore. This allows us to do semantic search over them. - ```python text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0) documents = text_splitter.split_documents(documents) @@ -46,7 +43,6 @@ vectorstore = Chroma.from_documents(documents, embeddings) We can now create a memory object, which is necessary to track the inputs/outputs and hold a conversation. - ```python from langchain.memory import ConversationBufferMemory memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True) @@ -54,18 +50,15 @@ memory = ConversationBufferMemory(memory_key="chat_history", return_messages=Tru We now initialize the `ConversationalRetrievalChain` - ```python qa = ConversationalRetrievalChain.from_llm(OpenAI(temperature=0), vectorstore.as_retriever(), memory=memory) ``` - ```python query = "What did the president say about Ketanji Brown Jackson" result = qa({"question": query}) ``` - ```python result["answer"] ``` @@ -78,13 +71,11 @@ result["answer"] - ```python query = "Did he mention who she succeeded" result = qa({"question": query}) ``` - ```python result['answer'] ``` @@ -101,21 +92,18 @@ result['answer'] In the above example, we used a Memory object to track chat history. We can also just pass it in explicitly. In order to do this, we need to initialize a chain without any memory object. - ```python qa = ConversationalRetrievalChain.from_llm(OpenAI(temperature=0), vectorstore.as_retriever()) ``` Here's an example of asking a question with no chat history - ```python chat_history = [] query = "What did the president say about Ketanji Brown Jackson" result = qa({"question": query, "chat_history": chat_history}) ``` - ```python result["answer"] ``` @@ -130,14 +118,12 @@ result["answer"] Here's an example of asking a question with some chat history - ```python chat_history = [(query, result["answer"])] query = "Did he mention who she succeeded" result = qa({"question": query, "chat_history": chat_history}) ``` - ```python result['answer'] ``` @@ -154,12 +140,10 @@ result['answer'] This chain has two steps. First, it condenses the current question and the chat history into a standalone question. This is necessary to create a standanlone vector to use for retrieval. After that, it does retrieval and then answers the question using retrieval augmented generation with a separate model. Part of the power of the declarative nature of LangChain is that you can easily use a separate language model for each call. This can be useful to use a cheaper and faster model for the simpler task of condensing the question, and then a more expensive model for answering the question. Here is an example of doing so. - ```python from langchain.chat_models import ChatOpenAI ``` - ```python qa = ConversationalRetrievalChain.from_llm( ChatOpenAI(temperature=0, model="gpt-4"), @@ -168,36 +152,90 @@ qa = ConversationalRetrievalChain.from_llm( ) ``` - ```python chat_history = [] query = "What did the president say about Ketanji Brown Jackson" result = qa({"question": query, "chat_history": chat_history}) ``` - ```python chat_history = [(query, result["answer"])] query = "Did he mention who she succeeded" result = qa({"question": query, "chat_history": chat_history}) ``` -## Return Source Documents -You can also easily return source documents from the ConversationalRetrievalChain. This is useful for when you want to inspect what documents were returned. +## Using a custom prompt for condensing the question +By default, ConversationalRetrievalQA uses CONDENSE_QUESTION_PROMPT to condense a question. Here is the implementation of this in the docs + +```python +from langchain.prompts.prompt import PromptTemplate + +_template = """Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language. + +Chat History: +{chat_history} +Follow Up Input: {question} +Standalone question:""" +CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template(_template) + +``` + +But instead of this any custom template can be used to further augment information in the question or instruct the LLM to do something. Here is an example + +```python +from langchain.prompts.prompt import PromptTemplate +``` + +```python +custom_template = """Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question. At the end of standalone question add this 'Answer the question in German language.' If you do not know the answer reply with 'I am sorry'. +Chat History: +{chat_history} +Follow Up Input: {question} +Standalone question:""" +``` + +```python +CUSTOM_QUESTION_PROMPT = PromptTemplate.from_template(custom_template) +``` + +```python +model = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.3) +embeddings = OpenAIEmbeddings() +vectordb = Chroma(embedding_function=embeddings, persist_directory=directory) +memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True) +qa = ConversationalRetrievalChain.from_llm( + model, + vectordb.as_retriever(), + condense_question_prompt=CUSTOM_QUESTION_PROMPT, + memory=memory +) +``` + +```python +query = "What did the president say about Ketanji Brown Jackson" +result = qa({"question": query}) +``` + +```python +query = "Did he mention who she succeeded" +result = qa({"question": query}) +``` + +## Return Source Documents + +You can also easily return source documents from the ConversationalRetrievalChain. This is useful for when you want to inspect what documents were returned. ```python qa = ConversationalRetrievalChain.from_llm(OpenAI(temperature=0), vectorstore.as_retriever(), return_source_documents=True) ``` - ```python chat_history = [] query = "What did the president say about Ketanji Brown Jackson" result = qa({"question": query, "chat_history": chat_history}) ``` - ```python result['source_documents'][0] ``` @@ -211,14 +249,13 @@ result['source_documents'][0] ## ConversationalRetrievalChain with `search_distance` -If you are using a vector store that supports filtering by search distance, you can add a threshold value parameter. +If you are using a vector store that supports filtering by search distance, you can add a threshold value parameter. ```python vectordbkwargs = {"search_distance": 0.9} ``` - ```python qa = ConversationalRetrievalChain.from_llm(OpenAI(temperature=0), vectorstore.as_retriever(), return_source_documents=True) chat_history = [] @@ -227,8 +264,8 @@ result = qa({"question": query, "chat_history": chat_history, "vectordbkwargs": ``` ## ConversationalRetrievalChain with `map_reduce` -We can also use different types of combine document chains with the ConversationalRetrievalChain chain. +We can also use different types of combine document chains with the ConversationalRetrievalChain chain. ```python from langchain.chains import LLMChain @@ -236,7 +273,6 @@ from langchain.chains.question_answering import load_qa_chain from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT ``` - ```python llm = OpenAI(temperature=0) question_generator = LLMChain(llm=llm, prompt=CONDENSE_QUESTION_PROMPT) @@ -249,14 +285,12 @@ chain = ConversationalRetrievalChain( ) ``` - ```python chat_history = [] query = "What did the president say about Ketanji Brown Jackson" result = chain({"question": query, "chat_history": chat_history}) ``` - ```python result['answer'] ``` @@ -273,12 +307,10 @@ result['answer'] You can also use this chain with the question answering with sources chain. - ```python from langchain.chains.qa_with_sources import load_qa_with_sources_chain ``` - ```python llm = OpenAI(temperature=0) question_generator = LLMChain(llm=llm, prompt=CONDENSE_QUESTION_PROMPT) @@ -291,14 +323,12 @@ chain = ConversationalRetrievalChain( ) ``` - ```python chat_history = [] query = "What did the president say about Ketanji Brown Jackson" result = chain({"question": query, "chat_history": chat_history}) ``` - ```python result['answer'] ``` @@ -315,7 +345,6 @@ result['answer'] Output from the chain will be streamed to `stdout` token by token in this example. - ```python from langchain.chains.llm import LLMChain from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler @@ -334,7 +363,6 @@ qa = ConversationalRetrievalChain( retriever=vectorstore.as_retriever(), combine_docs_chain=doc_chain, question_generator=question_generator) ``` - ```python chat_history = [] query = "What did the president say about Ketanji Brown Jackson" @@ -349,7 +377,6 @@ result = qa({"question": query, "chat_history": chat_history}) - ```python chat_history = [(query, result["answer"])] query = "Did he mention who she succeeded" @@ -365,8 +392,8 @@ result = qa({"question": query, "chat_history": chat_history}) ## get_chat_history Function -You can also specify a `get_chat_history` function, which can be used to format the chat_history string. +You can also specify a `get_chat_history` function, which can be used to format the chat_history string. ```python def get_chat_history(inputs) -> str: @@ -377,14 +404,12 @@ def get_chat_history(inputs) -> str: qa = ConversationalRetrievalChain.from_llm(OpenAI(temperature=0), vectorstore.as_retriever(), get_chat_history=get_chat_history) ``` - ```python chat_history = [] query = "What did the president say about Ketanji Brown Jackson" result = qa({"question": query, "chat_history": chat_history}) ``` - ```python result['answer'] ```