diff --git a/docs/docs/guides/evaluation/index.mdx b/docs/docs/guides/evaluation/index.mdx index d5cadc50fcb..955be3d2b7e 100644 --- a/docs/docs/guides/evaluation/index.mdx +++ b/docs/docs/guides/evaluation/index.mdx @@ -20,6 +20,21 @@ We also are working to share guides and cookbooks that demonstrate how to use th - [Chain Comparisons](/docs/guides/evaluation/examples/comparisons): This example uses a comparison evaluator to predict the preferred output. It reviews ways to measure confidence intervals to select statistically significant differences in aggregate preference scores across different models or prompts. + +## LangSmith Evaluation + +LangSmith provides an integrated evaluation and tracing framework that allows you to check for regressions, compare systems, and easily identify and fix any sources of errors and performance issues. Check out the docs on [LangSmith Evaluation](https://docs.smith.langchain.com/category/testing--evaluation) and additional [cookbooks](https://docs.smith.langchain.com/category/langsmith-cookbook) for more detailed information on evaluating your applications. + +## LangChain benchmarks + +Your application quality is a function both of the LLM you choose and the prompting and data retrieval strategies you employ to provide model contexet. We have published a number of benchmark tasks within the [LangChain Benchmarks](https://langchain-ai.github.io/langchain-benchmarks/) package to grade different LLM systems on tasks such as: + +- Agent tool use +- Retrieval-augmented question-answering +- Structured Extraction + +Check out the docs for examples and leaderboard information. + ## Reference Docs For detailed information on the available evaluators, including how to instantiate, configure, and customize them, check out the [reference documentation](https://api.python.langchain.com/en/latest/api_reference.html#module-langchain.evaluation) directly.