Fix rephrase step in chatbot use case (#16763)

This commit is contained in:
Jacob Lee 2024-01-29 23:25:25 -08:00 committed by GitHub
parent 546b757303
commit c6724a39f4
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 32 additions and 34 deletions

View File

@ -105,7 +105,7 @@
{
"data": {
"text/plain": [
"AIMessage(content=\"J'adore la programmation.\")"
"AIMessage(content=\"J'adore programmer.\")"
]
},
"execution_count": 3,
@ -169,7 +169,7 @@
{
"data": {
"text/plain": [
"AIMessage(content='I said \"J\\'adore la programmation\" which means \"I love programming\" in French.')"
"AIMessage(content='I said \"J\\'adore la programmation,\" which means \"I love programming\" in French.')"
]
},
"execution_count": 5,
@ -342,7 +342,7 @@
{
"data": {
"text/plain": [
"AIMessage(content='I said \"J\\'adore la programmation,\" which means \"I love programming\" in French.')"
"AIMessage(content='I said \"J\\'adore la programmation,\" which is the French translation for \"I love programming.\"')"
]
},
"execution_count": 10,
@ -535,7 +535,7 @@
{
"data": {
"text/plain": [
"'LangSmith can help with testing in several ways:\\n\\n1. Dataset Expansion: LangSmith allows you to quickly edit examples and add them to datasets, which expands the surface area of your evaluation sets. This helps in testing a wider range of scenarios and inputs.\\n\\n2. Model Fine-Tuning: You can use LangSmith to fine-tune a model for improved quality or reduced costs, which is essential for testing changes and ensuring optimized performance.\\n\\n3. Monitoring: LangSmith can be used to monitor your application by logging all traces, visualizing latency and token usage statistics, and troubleshooting specific issues as they arise. This monitoring capability is crucial for testing and evaluating the performance of your application in production.\\n\\n4. Construction of Datasets: LangSmith simplifies the process of constructing datasets, either by using existing datasets or by hand-crafting small datasets for rigorous testing of changes.\\n\\nOverall, LangSmith provides tools and capabilities that support thorough testing of applications and models, ultimately contributing to the reliability and performance of your systems.'"
"'LangSmith can assist with testing by providing the capability to quickly edit examples and add them to datasets. This allows for the expansion of evaluation sets or fine-tuning of a model for improved quality or reduced costs. Additionally, LangSmith simplifies the construction of small datasets by hand, providing a convenient way to rigorously test changes in the application.'"
]
},
"execution_count": 17,
@ -606,7 +606,7 @@
" Document(page_content='inputs, and see what happens. At some point though, our application is performing\\nwell and we want to be more rigorous about testing changes. We can use a dataset\\nthat weve constructed along the way (see above). Alternatively, we could spend some\\ntime constructing a small dataset by hand. For these situations, LangSmith simplifies', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
" Document(page_content='Skip to main content🦜🛠 LangSmith DocsPython DocsJS/TS DocsSearchGo to AppLangSmithOverviewTracingTesting & EvaluationOrganizationsHubLangSmith CookbookOverviewOn this pageLangSmith Overview and User GuideBuilding reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.Over the past two months, we at LangChain', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
" Document(page_content='have been building and using LangSmith with the goal of bridging this gap. This is our tactical user guide to outline effective ways to use LangSmith and maximize its benefits.On by default\\u200bAt LangChain, all of us have LangSmiths tracing running in the background by default. On the Python side, this is achieved by setting environment variables, which we establish whenever we launch a virtual environment or open our bash shell and leave them set. The same principle applies to most JavaScript', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'})],\n",
" 'answer': 'LangSmith can help with testing by simplifying the process of constructing and using datasets for testing and evaluation. It allows you to quickly edit examples and add them to datasets, thereby expanding the surface area of your evaluation sets. This can be useful for fine-tuning a model for improved quality or reduced costs. Additionally, LangSmith enables monitoring of your application by allowing you to log all traces, visualize latency and token usage statistics, and troubleshoot specific issues as they arise. This helps in rigorously testing changes and ensuring that your application performs well in production.'}"
" 'answer': 'LangSmith can help with testing in several ways:\\n\\n1. Dataset Expansion: LangSmith enables quick editing of examples and adding them to datasets, which expands the surface area of evaluation sets. This allows for more comprehensive testing of models and applications.\\n\\n2. Fine-Tuning Models: LangSmith facilitates the fine-tuning of models for improved quality or reduced costs. This is beneficial for optimizing the performance of models during testing.\\n\\n3. Monitoring: LangSmith can be used to monitor applications, log traces, visualize latency and token usage statistics, and troubleshoot specific issues as they arise during testing. This monitoring helps in ensuring the reliability and performance of the application during testing phases.\\n\\nOverall, LangSmith helps in making testing more rigorous and comprehensive, whether by expanding datasets, fine-tuning models, or monitoring application performance.'}"
]
},
"execution_count": 19,
@ -633,13 +633,13 @@
"data": {
"text/plain": [
"{'messages': [HumanMessage(content='how can langsmith help with testing?'),\n",
" AIMessage(content='LangSmith can help with testing by simplifying the process of constructing and using datasets for testing and evaluation. It allows you to quickly edit examples and add them to datasets, thereby expanding the surface area of your evaluation sets. This can be useful for fine-tuning a model for improved quality or reduced costs. Additionally, LangSmith enables monitoring of your application by allowing you to log all traces, visualize latency and token usage statistics, and troubleshoot specific issues as they arise. This helps in rigorously testing changes and ensuring that your application performs well in production.'),\n",
" AIMessage(content='LangSmith can help with testing in several ways:\\n\\n1. Dataset Expansion: LangSmith enables quick editing of examples and adding them to datasets, which expands the surface area of evaluation sets. This allows for more comprehensive testing of models and applications.\\n\\n2. Fine-Tuning Models: LangSmith facilitates the fine-tuning of models for improved quality or reduced costs. This is beneficial for optimizing the performance of models during testing.\\n\\n3. Monitoring: LangSmith can be used to monitor applications, log traces, visualize latency and token usage statistics, and troubleshoot specific issues as they arise during testing. This monitoring helps in ensuring the reliability and performance of the application during testing phases.\\n\\nOverall, LangSmith helps in making testing more rigorous and comprehensive, whether by expanding datasets, fine-tuning models, or monitoring application performance.'),\n",
" HumanMessage(content='tell me more about that!')],\n",
" 'context': [Document(page_content='however, there is still no complete substitute for human review to get the utmost quality and reliability from your application.', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
" Document(page_content='You can also quickly edit examples and add them to datasets to expand the surface area of your evaluation sets or to fine-tune a model for improved quality or reduced costs.Monitoring\\u200bAfter all this, your app might finally ready to go in production. LangSmith can also be used to monitor your application in much the same way that you used for debugging. You can log all traces, visualize latency and token usage statistics, and troubleshoot specific issues as they arise. Each run can also be', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
" Document(page_content=\"against these known issues.Why is this so impactful? When building LLM applications, its often common to start without a dataset of any kind. This is part of the power of LLMs! They are amazing zero-shot learners, making it possible to get started as easily as possible. But this can also be a curse -- as you adjust the prompt, you're wandering blind. You dont have any examples to benchmark your changes against.LangSmith addresses this problem by including an “Add to Dataset” button for each\", metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
" Document(page_content='playground. Here, you can modify the prompt and re-run it to observe the resulting changes to the output - as many times as needed!Currently, this feature supports only OpenAI and Anthropic models and works for LLM and Chat Model calls. We plan to extend its functionality to more LLM types, chains, agents, and retrievers in the future.What is the exact sequence of events?\\u200bIn complicated chains and agents, it can often be hard to understand what is going on under the hood. What calls are being', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'})],\n",
" 'answer': 'LangSmith facilitates testing by providing the capability to edit examples and add them to datasets, which in turn expands the range of scenarios for evaluating your application. This feature enables you to fine-tune your model for better quality and cost-effectiveness. Additionally, LangSmith allows for monitoring applications by logging traces, visualizing latency and token usage statistics, and troubleshooting specific issues as they arise. This comprehensive testing and monitoring functionality ensure that your application is robust and performs optimally in a production environment.'}"
" 'answer': 'Certainly! LangSmith offers the following capabilities to aid in testing:\\n\\n1. Dataset Expansion: By allowing quick editing of examples and adding them to datasets, LangSmith enables the expansion of evaluation sets. This is crucial for thorough testing of models and applications, as it broadens the range of scenarios and inputs that can be used to assess performance.\\n\\n2. Fine-Tuning Models: LangSmith supports the fine-tuning of models to enhance their quality and reduce operational costs. This capability is valuable during testing as it enables the optimization of model performance based on specific testing requirements and objectives.\\n\\n3. Monitoring: LangSmith provides monitoring features that allow for the logging of traces, visualization of latency and token usage statistics, and troubleshooting of issues as they occur during testing. This real-time monitoring helps in identifying and addressing any issues that may impact the reliability and performance of the application during testing.\\n\\nBy leveraging these features, LangSmith enhances the testing process by enabling comprehensive dataset expansion, model fine-tuning, and real-time monitoring to ensure the quality and reliability of applications and models.'}"
]
},
"execution_count": 20,
@ -676,7 +676,7 @@
{
"data": {
"text/plain": [
"'LangSmith provides a convenient way to modify prompts and re-run them to observe the resulting changes to the output. This \"Add to Dataset\" feature allows you to iteratively adjust the prompt and observe the impact on the output, enabling you to test and refine your prompts as many times as needed. This is particularly valuable when working with language model applications, as it helps address the challenge of starting without a dataset and allows for benchmarking changes against existing examples. Currently, this feature supports OpenAI and Anthropic models for LLM and Chat Model calls, with plans to extend its functionality to more LLM types, chains, agents, and retrievers in the future. Overall, LangSmith\\'s testing capabilities provide a valuable tool for refining and optimizing language model applications.'"
"\"LangSmith offers the capability to quickly edit examples and add them to datasets, thereby enhancing the scope of evaluation sets. This feature is particularly valuable for testing as it allows for a more thorough assessment of model performance and application behavior.\\n\\nFurthermore, LangSmith enables the fine-tuning of models to enhance quality and reduce costs, which can significantly impact testing outcomes. By adjusting and refining models, developers can ensure that they are thoroughly tested and optimized for various scenarios and use cases.\\n\\nAdditionally, LangSmith provides monitoring functionality, allowing users to log traces, visualize latency and token usage statistics, and troubleshoot specific issues as they encounter them during testing. This real-time monitoring and troubleshooting capability contribute to the overall effectiveness and reliability of the testing process.\\n\\nIn essence, LangSmith's features are designed to improve the quality and reliability of testing by expanding evaluation sets, fine-tuning models, and providing comprehensive monitoring capabilities. These aspects collectively contribute to a more robust and thorough testing process for applications and models.\""
]
},
"execution_count": 21,
@ -705,7 +705,7 @@
"source": [
"## Query transformation\n",
"\n",
"There's more optimization we'll cover here - in the above example, when we asked a followup question, `tell me more about that!`, you might notice that the retrieved docs don't directly include information about testing. This is because we're passing `tell me more about that!` verbatim as a query to the retriever. The output in the retrieval chain is still okay because the document chain retrieval chain can generate an answer based on the chat history, but we could be retrieving more rich and informative documents:"
"There's one more optimization we'll cover here - in the above example, when we asked a followup question, `tell me more about that!`, you might notice that the retrieved docs don't directly include information about testing. This is because we're passing `tell me more about that!` verbatim as a query to the retriever. The output in the retrieval chain is still okay because the document chain retrieval chain can generate an answer based on the chat history, but we could be retrieving more rich and informative documents:"
]
},
{
@ -786,13 +786,12 @@
"\n",
"query_transforming_retriever_chain = RunnableBranch(\n",
" (\n",
" # Both empty string and empty list evaluate to False\n",
" lambda x: not x.get(\"messages\", False),\n",
" # If no messages, then we just pass the last message's content to retriever\n",
" lambda x: len(x.get(\"messages\", [])) == 1,\n",
" # If only one message, then we just pass that message's content to retriever\n",
" (lambda x: x[\"messages\"][-1].content) | retriever,\n",
" ),\n",
" # If messages, then we pass inputs to LLM chain to transform the query, then pass to retriever\n",
" prompt | chat | StrOutputParser() | retriever,\n",
" query_transform_prompt | chat | StrOutputParser() | retriever,\n",
").with_config(run_name=\"chat_retriever_chain\")"
]
},
@ -836,12 +835,12 @@
"data": {
"text/plain": [
"{'messages': [HumanMessage(content='how can langsmith help with testing?'),\n",
" AIMessage(content='LangSmith can help with testing by providing tracing capabilities that allow you to monitor and log all traces, visualize latency, and track token usage statistics. This can be invaluable for testing the performance and reliability of your prompts, chains, and agents. Additionally, LangSmith enables you to troubleshoot specific issues as they arise during testing, allowing for more effective debugging and optimization of your applications.')],\n",
" AIMessage(content='LangSmith can assist with testing in several ways. It allows you to quickly edit examples and add them to datasets, expanding the range of evaluation sets. This can help in fine-tuning a model for improved quality or reduced costs. Additionally, LangSmith simplifies the construction of small datasets by hand, providing a convenient way to rigorously test changes in your application. Furthermore, it enables monitoring of your application by logging all traces, visualizing latency and token usage statistics, and troubleshooting specific issues as they arise.')],\n",
" 'context': [Document(page_content='You can also quickly edit examples and add them to datasets to expand the surface area of your evaluation sets or to fine-tune a model for improved quality or reduced costs.Monitoring\\u200bAfter all this, your app might finally ready to go in production. LangSmith can also be used to monitor your application in much the same way that you used for debugging. You can log all traces, visualize latency and token usage statistics, and troubleshoot specific issues as they arise. Each run can also be', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
" Document(page_content='inputs, and see what happens. At some point though, our application is performing\\nwell and we want to be more rigorous about testing changes. We can use a dataset\\nthat weve constructed along the way (see above). Alternatively, we could spend some\\ntime constructing a small dataset by hand. For these situations, LangSmith simplifies', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
" Document(page_content='Skip to main content🦜🛠 LangSmith DocsPython DocsJS/TS DocsSearchGo to AppLangSmithOverviewTracingTesting & EvaluationOrganizationsHubLangSmith CookbookOverviewOn this pageLangSmith Overview and User GuideBuilding reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.Over the past two months, we at LangChain', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
" Document(page_content='LangSmith Overview and User Guide | 🦜️🛠️ LangSmith', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
" Document(page_content='have been building and using LangSmith with the goal of bridging this gap. This is our tactical user guide to outline effective ways to use LangSmith and maximize its benefits.On by default\\u200bAt LangChain, all of us have LangSmiths tracing running in the background by default. On the Python side, this is achieved by setting environment variables, which we establish whenever we launch a virtual environment or open our bash shell and leave them set. The same principle applies to most JavaScript', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'})],\n",
" 'answer': 'LangSmith can help with testing by providing tracing capabilities that allow you to monitor and log all traces, visualize latency, and track token usage statistics. This can be invaluable for testing the performance and reliability of your prompts, chains, and agents. Additionally, LangSmith enables you to troubleshoot specific issues as they arise during testing, allowing for more effective debugging and optimization of your applications.'}"
" 'answer': 'LangSmith can assist with testing in several ways. It allows you to quickly edit examples and add them to datasets, expanding the range of evaluation sets. This can help in fine-tuning a model for improved quality or reduced costs. Additionally, LangSmith simplifies the construction of small datasets by hand, providing a convenient way to rigorously test changes in your application. Furthermore, it enables monitoring of your application by logging all traces, visualizing latency and token usage statistics, and troubleshooting specific issues as they arise.'}"
]
},
"execution_count": 26,
@ -870,13 +869,13 @@
"data": {
"text/plain": [
"{'messages': [HumanMessage(content='how can langsmith help with testing?'),\n",
" AIMessage(content='LangSmith can help with testing by providing tracing capabilities that allow you to monitor and log all traces, visualize latency, and track token usage statistics. This can be invaluable for testing the performance and reliability of your prompts, chains, and agents. Additionally, LangSmith enables you to troubleshoot specific issues as they arise during testing, allowing for more effective debugging and optimization of your applications.'),\n",
" AIMessage(content='LangSmith can assist with testing in several ways. It allows you to quickly edit examples and add them to datasets, expanding the range of evaluation sets. This can help in fine-tuning a model for improved quality or reduced costs. Additionally, LangSmith simplifies the construction of small datasets by hand, providing a convenient way to rigorously test changes in your application. Furthermore, it enables monitoring of your application by logging all traces, visualizing latency and token usage statistics, and troubleshooting specific issues as they arise.'),\n",
" HumanMessage(content='tell me more about that!')],\n",
" 'context': [Document(page_content='You can also quickly edit examples and add them to datasets to expand the surface area of your evaluation sets or to fine-tune a model for improved quality or reduced costs.Monitoring\\u200bAfter all this, your app might finally ready to go in production. LangSmith can also be used to monitor your application in much the same way that you used for debugging. You can log all traces, visualize latency and token usage statistics, and troubleshoot specific issues as they arise. Each run can also be', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
" Document(page_content='environments through process.env1.The benefit here is that all calls to LLMs, chains, agents, tools, and retrievers are logged to LangSmith. Around 90% of the time we dont even look at the traces, but the 10% of the time that we do… its so helpful. We can use LangSmith to debug:An unexpected end resultWhy an agent is loopingWhy a chain was slower than expectedHow many tokens an agent usedDebugging\\u200bDebugging LLMs, chains, and agents can be tough. LangSmith helps solve the following pain', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
" 'context': [Document(page_content='LangSmith Overview and User Guide | 🦜️🛠️ LangSmith', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
" Document(page_content='You can also quickly edit examples and add them to datasets to expand the surface area of your evaluation sets or to fine-tune a model for improved quality or reduced costs.Monitoring\\u200bAfter all this, your app might finally ready to go in production. LangSmith can also be used to monitor your application in much the same way that you used for debugging. You can log all traces, visualize latency and token usage statistics, and troubleshoot specific issues as they arise. Each run can also be', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
" Document(page_content='Skip to main content🦜🛠 LangSmith DocsPython DocsJS/TS DocsSearchGo to AppLangSmithOverviewTracingTesting & EvaluationOrganizationsHubLangSmith CookbookOverviewOn this pageLangSmith Overview and User GuideBuilding reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.Over the past two months, we at LangChain', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
" Document(page_content='have been building and using LangSmith with the goal of bridging this gap. This is our tactical user guide to outline effective ways to use LangSmith and maximize its benefits.On by default\\u200bAt LangChain, all of us have LangSmiths tracing running in the background by default. On the Python side, this is achieved by setting environment variables, which we establish whenever we launch a virtual environment or open our bash shell and leave them set. The same principle applies to most JavaScript', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'})],\n",
" 'answer': \"Certainly! LangSmith's tracing capabilities allow you to monitor and log all traces of your application, providing visibility into the behavior and performance of your prompts, chains, and agents during testing. This can help you identify any unexpected end results, performance bottlenecks, or excessive token usage. By visualizing latency and token usage statistics, you can gain insights into the efficiency and resource consumption of your application, enabling you to make informed decisions about optimization and fine-tuning. Additionally, LangSmith's troubleshooting features empower you to address specific issues that may arise during testing, ultimately contributing to the reliability and quality of your applications.\"}"
" Document(page_content='inputs, and see what happens. At some point though, our application is performing\\nwell and we want to be more rigorous about testing changes. We can use a dataset\\nthat weve constructed along the way (see above). Alternatively, we could spend some\\ntime constructing a small dataset by hand. For these situations, LangSmith simplifies', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'})],\n",
" 'answer': 'Certainly! LangSmith simplifies the process of constructing and editing datasets, which is essential for testing and fine-tuning models. By quickly editing examples and adding them to datasets, you can expand the surface area of your evaluation sets, leading to improved model quality and potentially reduced costs. Additionally, LangSmith provides monitoring capabilities for your application, allowing you to log all traces, visualize latency and token usage statistics, and troubleshoot specific issues as they arise. This comprehensive monitoring functionality helps ensure the reliability and performance of your application in production.'}"
]
},
"execution_count": 27,
@ -896,9 +895,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"To help you understand what's happening internally, [this LangSmith trace](https://smith.langchain.com/public/e8b88115-d8d8-4332-b987-7c3003234746/r) shows the first invocation. You can see that the user's initial query is passed directly to the retriever, which return suitable docs.\n",
"To help you understand what's happening internally, [this LangSmith trace](https://smith.langchain.com/public/42f8993b-7d19-42d3-990a-6608a73c5824/r) shows the first invocation. You can see that the user's initial query is passed directly to the retriever, which return suitable docs.\n",
"\n",
"The invocation for followup question, [illustrated by this LangSmith trace](https://smith.langchain.com/public/ebb566ad-b69d-496e-b307-66bf70224c31/r) rephrases the user's initial question to something more relevant to testing with LangSmith, resulting in higher quality docs.\n",
"The invocation for followup question, [illustrated by this LangSmith trace](https://smith.langchain.com/public/7b463791-868b-42bd-8035-17b471e9c7cd/r) rephrases the user's initial question to something more relevant to testing with LangSmith, resulting in higher quality docs.\n",
"\n",
"And we now have a chatbot capable of conversational retrieval!\n",
"\n",

View File

@ -265,7 +265,7 @@
{
"data": {
"text/plain": [
"\"I don't know about LangSmith's specific capabilities for testing LLM applications. It's best to directly reach out to LangSmith or check their website for information on their services related to LLM application testing.\""
"\"I don't know about LangSmith's specific capabilities for testing LLM applications. It's best to reach out to LangSmith directly to inquire about their services and how they can assist with testing your LLM applications.\""
]
},
"execution_count": 9,
@ -339,7 +339,7 @@
" Document(page_content='LangSmith Overview and User Guide | 🦜️🛠️ LangSmith', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
" Document(page_content='You can also quickly edit examples and add them to datasets to expand the surface area of your evaluation sets or to fine-tune a model for improved quality or reduced costs.Monitoring\\u200bAfter all this, your app might finally ready to go in production. LangSmith can also be used to monitor your application in much the same way that you used for debugging. You can log all traces, visualize latency and token usage statistics, and troubleshoot specific issues as they arise. Each run can also be', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
" Document(page_content=\"does that affect the output?\\u200bSo you notice a bad output, and you go into LangSmith to see what's going on. You find the faulty LLM call and are now looking at the exact input. You want to try changing a word or a phrase to see what happens -- what do you do?We constantly ran into this issue. Initially, we copied the prompt to a playground of sorts. But this got annoying, so we built a playground of our own! When examining an LLM call, you can click the Open in Playground button to access this\", metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'})],\n",
" 'answer': 'Yes, LangSmith can help test and evaluate your LLM applications. It simplifies the initial setup and provides features for monitoring your application, logging all traces, visualizing latency and token usage statistics, and troubleshooting specific issues. It can also be used to edit examples and add them to datasets to expand the surface area of your evaluation sets or to fine-tune a model for improved quality or reduced costs.'}"
" 'answer': 'Yes, LangSmith can help test and evaluate your LLM applications. It simplifies the initial setup, and you can use it to monitor your application, log all traces, visualize latency and token usage statistics, and troubleshoot specific issues as they arise.'}"
]
},
"execution_count": 11,
@ -466,9 +466,8 @@
"\n",
"query_transforming_retriever_chain = RunnableBranch(\n",
" (\n",
" # Both empty string and empty list evaluate to False\n",
" lambda x: not x.get(\"messages\", False),\n",
" # If no messages, then we just pass the last message's content to retriever\n",
" lambda x: len(x.get(\"messages\", [])) == 1,\n",
" # If only one message, then we just pass that message's content to retriever\n",
" (lambda x: x[\"messages\"][-1].content) | retriever,\n",
" ),\n",
" # If messages, then we pass inputs to LLM chain to transform the query, then pass to retriever\n",
@ -533,11 +532,11 @@
"data": {
"text/plain": [
"{'messages': [HumanMessage(content='Can LangSmith help test my LLM applications?')],\n",
" 'context': [Document(page_content='LangSmith Overview and User Guide | 🦜️🛠️ LangSmith', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
" Document(page_content='Skip to main content🦜🛠 LangSmith DocsPython DocsJS/TS DocsSearchGo to AppLangSmithOverviewTracingTesting & EvaluationOrganizationsHubLangSmith CookbookOverviewOn this pageLangSmith Overview and User GuideBuilding reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.Over the past two months, we at LangChain', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
" 'context': [Document(page_content='Skip to main content🦜🛠 LangSmith DocsPython DocsJS/TS DocsSearchGo to AppLangSmithOverviewTracingTesting & EvaluationOrganizationsHubLangSmith CookbookOverviewOn this pageLangSmith Overview and User GuideBuilding reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.Over the past two months, we at LangChain', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
" Document(page_content='LangSmith Overview and User Guide | 🦜️🛠️ LangSmith', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
" Document(page_content='You can also quickly edit examples and add them to datasets to expand the surface area of your evaluation sets or to fine-tune a model for improved quality or reduced costs.Monitoring\\u200bAfter all this, your app might finally ready to go in production. LangSmith can also be used to monitor your application in much the same way that you used for debugging. You can log all traces, visualize latency and token usage statistics, and troubleshoot specific issues as they arise. Each run can also be', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
" Document(page_content='datasets\\u200bLangSmith makes it easy to curate datasets. However, these arent just useful inside LangSmith; they can be exported for use in other contexts. Notable applications include exporting for use in OpenAI Evals or fine-tuning, such as with FireworksAI.To set up tracing in Deno, web browsers, or other runtime environments without access to the environment, check out the FAQs.↩PreviousLangSmithNextTracingOn by defaultDebuggingWhat was the exact input to the LLM?If I edit the prompt, how does', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'})],\n",
" 'answer': 'Yes, LangSmith can help test and evaluate your LLM applications. It provides tools for monitoring your application, logging traces, visualizing latency and token usage statistics, and troubleshooting specific issues as they arise. Additionally, you can quickly edit examples and add them to datasets to expand the surface area of your evaluation sets or to fine-tune a model for improved quality or reduced costs.'}"
" Document(page_content=\"does that affect the output?\\u200bSo you notice a bad output, and you go into LangSmith to see what's going on. You find the faulty LLM call and are now looking at the exact input. You want to try changing a word or a phrase to see what happens -- what do you do?We constantly ran into this issue. Initially, we copied the prompt to a playground of sorts. But this got annoying, so we built a playground of our own! When examining an LLM call, you can click the Open in Playground button to access this\", metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'})],\n",
" 'answer': 'Yes, LangSmith can help test and evaluate LLM (Language Model) applications. It simplifies the initial setup, and you can use it to monitor your application, log all traces, visualize latency and token usage statistics, and troubleshoot specific issues as they arise.'}"
]
},
"execution_count": 16,
@ -570,7 +569,7 @@
" Document(page_content='You can also quickly edit examples and add them to datasets to expand the surface area of your evaluation sets or to fine-tune a model for improved quality or reduced costs.Monitoring\\u200bAfter all this, your app might finally ready to go in production. LangSmith can also be used to monitor your application in much the same way that you used for debugging. You can log all traces, visualize latency and token usage statistics, and troubleshoot specific issues as they arise. Each run can also be', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
" Document(page_content='Skip to main content🦜🛠 LangSmith DocsPython DocsJS/TS DocsSearchGo to AppLangSmithOverviewTracingTesting & EvaluationOrganizationsHubLangSmith CookbookOverviewOn this pageLangSmith Overview and User GuideBuilding reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.Over the past two months, we at LangChain', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
" Document(page_content='LangSmith makes it easy to manually review and annotate runs through annotation queues.These queues allow you to select any runs based on criteria like model type or automatic evaluation scores, and queue them up for human review. As a reviewer, you can then quickly step through the runs, viewing the input, output, and any existing tags before adding your own feedback.We often use this for a couple of reasons:To assess subjective qualities that automatic evaluators struggle with, like', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'})],\n",
" 'answer': 'LangSmith simplifies the initial setup for building reliable LLM applications. It provides features for manually reviewing and annotating runs through annotation queues, allowing you to select runs based on criteria like model type or automatic evaluation scores, and queue them up for human review. As a reviewer, you can quickly step through the runs, view the input, output, and any existing tags before adding your own feedback. This can be especially useful for assessing subjective qualities that automatic evaluators struggle with.'}"
" 'answer': 'LangSmith simplifies the initial setup for building reliable LLM applications, but it acknowledges that there is still work needed to bring the performance of prompts, chains, and agents up to the level where they are reliable enough to be used in production. It also provides the capability to manually review and annotate runs through annotation queues, allowing you to select runs based on criteria like model type or automatic evaluation scores for human review. This feature is particularly useful for assessing subjective qualities that automatic evaluators struggle with.'}"
]
},
"execution_count": 17,
@ -596,7 +595,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"You can check out [this LangSmith trace](https://smith.langchain.com/public/8a4b8ab0-f2dc-4b50-9473-6c6c0c9c289d/r) to see the internal query transformation step for yourself.\n",
"You can check out [this LangSmith trace](https://smith.langchain.com/public/bb329a3b-e92a-4063-ad78-43f720fbb5a2/r) to see the internal query transformation step for yourself.\n",
"\n",
"## Streaming\n",
"\n",