DOC: Tutorial Section Updates (#27675)

Edited various notebooks in the tutorial section to fix: * Grammatical Errors * Improve Readability by changing the sentence structure or reducing repeated words which bears the same meaning * Edited a code block to follow the PEP 8 Standard * Added more information in some sentences to make the concept more clear and reduce ambiguity --------- Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
2025-08-07 12:06:43 +00:00 · 2024-10-29 22:51:34 +08:00 · 2024-10-29 22:51:34 +08:00 · 9ccd4a6ffb
commit 9ccd4a6ffb
parent c1d8c33df6
4 changed files with 44 additions and 44 deletions
--- a/docs/docs/tutorials/agents.ipynb
+++ b/docs/docs/tutorials/agents.ipynb
@ -33,7 +33,7 @@
    "\n",
    "By themselves, language models can't take actions - they just output text.\n",
    "A big use case for LangChain is creating **agents**.\n",
-    "Agents are systems that use LLMs as reasoning engines to determine which actions to take and the inputs to pass them.\n",
+    "Agents are systems that use LLMs as reasoning engines to determine which actions to take and the inputs necessary to perform the action.\n",
    "After executing actions, the results can be fed back into the LLM to determine whether more actions are needed, or whether it is okay to finish.\n",
    "\n",
    "In this tutorial we will build an agent that can interact with a search engine. You will be able to ask this agent questions, watch it call the search tool, and have conversations with it.\n",
@ -371,7 +371,7 @@
    "## Create the agent\n",
    "\n",
    "Now that we have defined the tools and the LLM, we can create the agent. We will be using [LangGraph](/docs/concepts/#langgraph) to construct the agent. \n",
-    "Currently we are using a high level interface to construct the agent, but the nice thing about LangGraph is that this high-level interface is backed by a low-level, highly controllable API in case you want to modify the agent logic.\n"
+    "Currently, we are using a high level interface to construct the agent, but the nice thing about LangGraph is that this high-level interface is backed by a low-level, highly controllable API in case you want to modify the agent logic.\n"
   ]
  },
  {
@ -403,7 +403,7 @@
   "source": [
    "## Run the agent\n",
    "\n",
-    "We can now run the agent on a few queries! Note that for now, these are all **stateless** queries (it won't remember previous interactions). Note that the agent will return the **final** state at the end of the interaction (which includes any inputs, we will see later on how to get only the outputs).\n",
+    "We can now run the agent with a few queries! Note that for now, these are all **stateless** queries (it won't remember previous interactions). Note that the agent will return the **final** state at the end of the interaction (which includes any inputs, we will see later on how to get only the outputs).\n",
    "\n",
    "First up, let's see how it responds when there's no need to call a tool:"
   ]
@ -484,7 +484,7 @@
   "source": [
    "## Streaming Messages\n",
    "\n",
-    "We've seen how the agent can be called with `.invoke` to get back a final response. If the agent is executing multiple steps, that may take a while. In order to show intermediate progress, we can stream back messages as they occur."
+    "We've seen how the agent can be called with `.invoke` to get  a final response. If the agent executes multiple steps, this may take a while. To show intermediate progress, we can stream back messages as they occur."
   ]
  },
  {
@ -521,7 +521,7 @@
   "source": [
    "## Streaming tokens\n",
    "\n",
-    "In addition to streaming back messages, it is also useful to be streaming back tokens.\n",
+    "In addition to streaming back messages, it is also useful to stream back tokens.\n",
    "We can do this with the `.astream_events` method.\n",
    "\n",
    ":::important\n",
@ -680,7 +680,7 @@
   "id": "ae908088",
   "metadata": {},
   "source": [
-    "If I want to start a new conversation, all I have to do is change the `thread_id` used"
+    "If you want to start a new conversation, all you have to do is change the `thread_id` used"
   ]
  },
  {
@ -715,9 +715,9 @@
    "## Conclusion\n",
    "\n",
    "That's a wrap! In this quick start we covered how to create a simple agent. \n",
-    "We've then shown how to stream back a response - not only the intermediate steps, but also tokens!\n",
+    "We've then shown how to stream back a response - not only with the intermediate steps, but also tokens!\n",
    "We've also added in memory so you can have a conversation with them.\n",
-    "Agents are a complex topic, and there's lot to learn! \n",
+    "Agents are a complex topic with lots to learn! \n",
    "\n",
    "For more information on Agents, please check out the [LangGraph](/docs/concepts/#langgraph) documentation. This has it's own set of concepts, tutorials, and how-to guides."
   ]
--- a/docs/docs/tutorials/classification.ipynb
+++ b/docs/docs/tutorials/classification.ipynb
@ -22,11 +22,11 @@
    "\n",
    "Tagging means labeling a document with classes such as:\n",
    "\n",
-    "- sentiment\n",
-    "- language\n",
-    "- style (formal, informal etc.)\n",
-    "- covered topics\n",
-    "- political tendency\n",
+    "- Sentiment\n",
+    "- Language\n",
+    "- Style (formal, informal etc.)\n",
+    "- Covered topics\n",
+    "- Political tendency\n",
    "\n",
    "![Image description](../../static/img/tagging.png)\n",
    "\n",
@ -130,7 +130,7 @@
   "id": "ff3cf30d",
   "metadata": {},
   "source": [
-    "If we want JSON output, we can just call `.dict()`"
+    "If we want dictionary output, we can just call `.dict()`"
   ]
  },
  {
@ -179,9 +179,9 @@
    "\n",
    "Specifically, we can define:\n",
    "\n",
-    "- possible values for each property\n",
-    "- description to make sure that the model understands the property\n",
-    "- required properties to be returned"
+    "- Possible values for each property\n",
+    "- Description to make sure that the model understands the property\n",
+    "- Required properties to be returned"
   ]
  },
  {
--- a/docs/docs/tutorials/data_generation.ipynb
+++ b/docs/docs/tutorials/data_generation.ipynb
@ -37,7 +37,7 @@
    "\n",
    "## Quickstart\n",
    "\n",
-    "In this notebook, we'll dive deep into generating synthetic medical billing records using the langchain library. This tool is particularly useful when you want to develop or test algorithms but don't want to use real patient data due to privacy concerns or data availability issues."
+    "In this notebook, we'll dive deep into generating synthetic medical billing records using the `langchain` library. This tool is particularly useful when you want to develop or test algorithms but don't want to use real patient data due to privacy concerns or data availability issues."
   ]
  },
  {
@ -46,7 +46,7 @@
   "metadata": {},
   "source": [
    "### Setup\n",
-    "First, you'll need to have the langchain library installed, along with its dependencies. Since we're using the OpenAI generator chain, we'll install that as well. Since this is an experimental lib, we'll need to include `langchain_experimental` in our installs. We'll then import the necessary modules."
+    "First, you'll need to have the `langchain` library installed, along with its dependencies. Since we're using the OpenAI generator chain, we'll install that as well. Since this is an experimental library, we'll need to include `langchain_experimental` in our installation. We'll then import the necessary modules."
   ]
  },
  {
@ -99,7 +99,7 @@
   "metadata": {},
   "source": [
    "## 1. Define Your Data Model\n",
-    "Every dataset has a structure or a \"schema\". The MedicalBilling class below serves as our schema for the synthetic data. By defining this, we're informing our synthetic data generator about the shape and nature of data we expect."
+    "Every dataset has a structure or a \"schema\". The `MedicalBilling` class below serves as our schema for the synthetic data. By defining this, we're informing our synthetic data generator about the shape and nature of data we expect."
   ]
  },
  {
@ -126,7 +126,7 @@
    "For instance, every record will have a `patient_id` that's an integer, a `patient_name` that's a string, and so on.\n",
    "\n",
    "## 2. Sample Data\n",
-    "To guide the synthetic data generator, it's useful to provide it with a few real-world-like examples. These examples serve as a \"seed\" - they're representative of the kind of data you want, and the generator will use them to create more data that looks similar.\n",
+    "To guide the synthetic data generator, it's useful to provide it with a few real-world-like examples. These examples serve as a \"seed\" - they're representative of the kind of data you want, and the generator will use them to create  data that looks similar to your expectations.\n",
    "\n",
    "Here are some fictional medical billing records:"
   ]
@ -194,7 +194,7 @@
    "- `example_prompt`: This prompt template is the format we want each example row to take in our prompt.\n",
    "\n",
    "## 4. Creating the Data Generator\n",
-    "With the schema and the prompt ready, the next step is to create the data generator. This object knows how to communicate with the underlying language model to get synthetic data."
+    "With the schema and the prompt ready, the next step is to create the data generator. This object knows how to communicate with the underlying language model to generate synthetic data."
   ]
  },
  {
@ -219,7 +219,7 @@
   "metadata": {},
   "source": [
    "## 5. Generate Synthetic Data\n",
-    "Finally, let's get our synthetic data!"
+    "Finally, let's generate our synthetic data!"
   ]
  },
  {
@ -241,7 +241,7 @@
   "id": "fa4402e9",
   "metadata": {},
   "source": [
-    "This command asks the generator to produce 10 synthetic medical billing records. The results are stored in `synthetic_results`. The output will be a list of the MedicalBilling pydantic models."
+    "This command asks the generator to produce 10 synthetic medical billing records. The results are stored in `synthetic_results`. The output will be a list of the `MedicalBilling` pydantic model."
   ]
  },
  {
@ -400,7 +400,7 @@
   "id": "93c7a4bb",
   "metadata": {},
   "source": [
-    "As we can see created examples are diversified and possess information we wanted them to have. Also, their style reflects the given preferences quite well."
+    "As we can see, the created examples are diversified and possess information we wanted them to have. Also, their style reflects our given preferences quite well."
   ]
  },
  {
--- a/docs/docs/tutorials/rag.ipynb
+++ b/docs/docs/tutorials/rag.ipynb
@ -23,7 +23,7 @@
    "\n",
    "RAG is a technique for augmenting LLM knowledge with additional data.\n",
    "\n",
-    "LLMs can reason about wide-ranging topics, but their knowledge is limited to the public data up to a specific point in time that they were trained on. If you want to build AI applications that can reason about private data or data introduced after a model's cutoff date, you need to augment the knowledge of the model with the specific information it needs. The process of bringing the appropriate information and inserting it into the model prompt is known as Retrieval Augmented Generation (RAG).\n",
+    "LLMs can reason about wide-ranging topics, but their knowledge is limited to the public data up to a specific point in time that they were trained on. If you want to build AI applications that can reason about private data or data introduced after a model's cutoff date, you need to augment the knowledge of the model with the specific information it needs. The process of bringing and inserting appropriate information into the model prompt is known as Retrieval Augmented Generation (RAG).\n",
    "\n",
    "LangChain has a number of components designed to help build Q&A applications, and RAG applications more generally. \n",
    "\n",
@ -40,14 +40,14 @@
    "\n",
    "### Indexing\n",
    "1. **Load**: First we need to load our data. This is done with [Document Loaders](/docs/concepts/document_loaders).\n",
-    "2. **Split**: [Text splitters](/docs/concepts/text_splitters) break large `Documents` into smaller chunks. This is useful both for indexing data and for passing it in to a model, since large chunks are harder to search over and won't fit in a model's finite context window.\n",
-    "3. **Store**: We need somewhere to store and index our splits, so that they can later be searched over. This is often done using a [VectorStore](/docs/concepts/vectorstores) and [Embeddings](/docs/concepts/embedding_models) model.\n",
+    "2. **Split**: [Text splitters](/docs/concepts/text_splitters) break large `Documents` into smaller chunks. This is useful both for indexing data and passing it into a model, as large chunks are harder to search over and won't fit in a model's finite context window.\n",
+    "3. **Store**: We need somewhere to store and index our splits, so that they can be searched over later. This is often done using a [VectorStore](/docs/concepts/vectorstores) and [Embeddings](/docs/concepts/embedding_models) model.\n",
    "\n",
    "![index_diagram](../../static/img/rag_indexing.png)\n",
    "\n",
    "### Retrieval and generation\n",
    "4. **Retrieve**: Given a user input, relevant splits are retrieved from storage using a [Retriever](/docs/concepts/retrievers).\n",
-    "5. **Generate**: A [ChatModel](/docs/concepts/chat_models) / [LLM](/docs/concepts/text_llms) produces an answer using a prompt that includes the question and the retrieved data\n",
+    "5. **Generate**: A [ChatModel](/docs/concepts/chat_models) / [LLM](/docs/concepts/text_llms) produces an answer using a prompt that includes both the question with the retrieved data\n",
    "\n",
    "![retrieval_diagram](../../static/img/rag_retrieval_generation.png)\n",
    "\n",
@ -56,7 +56,7 @@
    "\n",
    "### Jupyter Notebook\n",
    "\n",
-    "This guide (and most of the other guides in the documentation) uses [Jupyter notebooks](https://jupyter.org/) and assumes the reader is as well. Jupyter notebooks are perfect for learning how to work with LLM systems because oftentimes things can go wrong (unexpected output, API down, etc) and going through guides in an interactive environment is a great way to better understand them.\n",
+    "This guide (and most of the other guides in the documentation) uses [Jupyter notebooks](https://jupyter.org/) and assumes the reader is as well. Jupyter notebooks are perfect for learning how to work with LLM systems because oftentimes things can go wrong (unexpected output, API down, etc) and going through guides in an interactive environment is a great way to better understand LLM concepts.\n",
    "\n",
    "This and other tutorials are perhaps most conveniently run in a Jupyter notebook. See [here](https://jupyter.org/install) for instructions on how to install.\n",
    "\n",
@ -100,7 +100,7 @@
    "### LangSmith\n",
    "\n",
    "Many of the applications you build with LangChain will contain multiple steps with multiple invocations of LLM calls.\n",
-    "As these applications get more and more complex, it becomes crucial to be able to inspect what exactly is going on inside your chain or agent.\n",
+    "As these applications get more complex, it becomes crucial to be able to inspect what exactly is going on inside your chain or agent.\n",
    "The best way to do this is with [LangSmith](https://smith.langchain.com).\n",
    "\n",
    "After you sign up at the link above, make sure to set your environment variables to start logging traces:\n",
@ -121,7 +121,7 @@
    "```\n",
    "## Preview\n",
    "\n",
-    "In this guide we’ll build an app that answers questions about the content of a website. The specific website we will use is the [LLM Powered Autonomous\n",
+    "In this guide we’ll build an app that answers questions about the website's content. The specific website we will use is the [LLM Powered Autonomous\n",
    "Agents](https://lilianweng.github.io/posts/2023-06-23-agent/) blog post\n",
    "by Lilian Weng, which allows us to ask questions about the contents of\n",
    "the post.\n",
@ -248,7 +248,7 @@
    "[WebBaseLoader](/docs/integrations/document_loaders/web_base),\n",
    "which uses `urllib` to load HTML from web URLs and `BeautifulSoup` to\n",
    "parse it to text. We can customize the HTML -\\> text parsing by passing\n",
-    "in parameters to the `BeautifulSoup` parser via `bs_kwargs` (see\n",
+    "in parameters into the `BeautifulSoup` parser via `bs_kwargs` (see\n",
    "[BeautifulSoup\n",
    "docs](https://beautiful-soup-4.readthedocs.io/en/latest/#beautifulsoup)).\n",
    "In this case only HTML tags with class “post-content”, “post-title”, or\n",
@ -329,19 +329,19 @@
    "- [Integrations](/docs/integrations/document_loaders/): 160+\n",
    "  integrations to choose from.\n",
    "- [Interface](https://python.langchain.com/api_reference/core/document_loaders/langchain_core.document_loaders.base.BaseLoader.html):\n",
-    "  API reference  for the base interface.\n",
+    "  API reference for the base interface.\n",
    "\n",
    "\n",
    "## 2. Indexing: Split {#indexing-split}\n",
    "\n",
    "\n",
-    "Our loaded document is over 42k characters long. This is too long to fit\n",
-    "in the context window of many models. Even for those models that could\n",
+    "Our loaded document is over 42k characters which is too long to fit\n",
+    "into the context window of many models. Even for those models that could\n",
    "fit the full post in their context window, models can struggle to find\n",
    "information in very long inputs.\n",
    "\n",
    "To handle this we’ll split the `Document` into chunks for embedding and\n",
-    "vector storage. This should help us retrieve only the most relevant bits\n",
+    "vector storage. This should help us retrieve only the most relevant parts\n",
    "of the blog post at run time.\n",
    "\n",
    "In this case we’ll split our documents into chunks of 1000 characters\n",
@ -353,7 +353,7 @@
    "new lines until each chunk is the appropriate size. This is the\n",
    "recommended text splitter for generic text use cases.\n",
    "\n",
-    "We set `add_start_index=True` so that the character index at which each\n",
+    "We set `add_start_index=True` so that the character index where each\n",
    "split Document starts within the initial Document is preserved as\n",
    "metadata attribute “start_index”."
   ]
@ -595,7 +595,7 @@
    "    [variants of the\n",
    "    embeddings](/docs/how_to/multi_vector),\n",
    "    also in order to improve retrieval hit rate.\n",
-    "  - `Max marginal relevance` selects for [relevance and\n",
+    "  - `Maximal marginal relevance` selects for [relevance and\n",
    "    diversity](https://www.cs.cmu.edu/~jgc/publication/The_Use_MMR_Diversity_Based_LTMIR_1998.pdf)\n",
    "    among the retrieved documents to avoid passing in duplicate\n",
    "    context.\n",
@ -610,7 +610,7 @@
    "## 5. Retrieval and Generation: Generate {#retrieval-and-generation-generate}\n",
    "\n",
    "Let’s put it all together into a chain that takes a question, retrieves\n",
-    "relevant documents, constructs a prompt, passes that to a model, and\n",
+    "relevant documents, constructs a prompt, passes it into a model, and\n",
    "parses the output.\n",
    "\n",
    "We’ll use the gpt-4o-mini OpenAI chat model, but any LangChain `LLM`\n",
@ -733,7 +733,7 @@
    "\n",
    "First: each of these components (`retriever`, `prompt`, `llm`, etc.) are instances of [Runnable](/docs/concepts#langchain-expression-language-lcel). This means that they implement the same methods-- such as sync and async `.invoke`, `.stream`, or `.batch`-- which makes them easier to connect together. They can be connected into a [RunnableSequence](https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.base.RunnableSequence.html)-- another Runnable-- via the `|` operator.\n",
    "\n",
-    "LangChain will automatically cast certain objects to runnables when met with the `|` operator. Here, `format_docs` is cast to a [RunnableLambda](https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.base.RunnableLambda.html), and the dict with `\"context\"` and `\"question\"` is cast to a [RunnableParallel](https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.base.RunnableParallel.html). The details are less important than the bigger point, which is that each object is a Runnable.\n",
+    "LangChain will automatically cast certain objects to runnables when met with the `|` operator. Here, `format_docs` is cast to a [RunnableLambda](https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.base.RunnableLambda.html), and the dict with `\"context\"` and `\"question\"` is cast to a [RunnableParallel](https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.base.RunnableParallel.html). The details are less important than the bigger point, which is that each object in the chain is a Runnable.\n",
    "\n",
    "Let's trace how the input question flows through the above runnables.\n",
    "\n",
@ -757,7 +757,7 @@
    "\n",
    "### Built-in chains\n",
    "\n",
-    "If preferred, LangChain includes convenience functions that implement the above LCEL. We compose two functions:\n",
+    "If preferred, LangChain includes convenient functions that implement the above LCEL. We compose two functions:\n",
    "\n",
    "- [create_stuff_documents_chain](https://python.langchain.com/api_reference/langchain/chains/langchain.chains.combine_documents.stuff.create_stuff_documents_chain.html) specifies how retrieved context is fed into a prompt and LLM. In this case, we will \"stuff\" the contents into the prompt -- i.e., we will include all retrieved context without any summarization or other processing. It largely implements our above `rag_chain`, with input keys `context` and `input`-- it generates an answer using retrieved context and query.\n",
    "- [create_retrieval_chain](https://python.langchain.com/api_reference/langchain/chains/langchain.chains.retrieval.create_retrieval_chain.html) adds the retrieval step and propagates the retrieved context through the chain, providing it alongside the final answer. It has input key `input`, and includes `input`, `context`, and `answer` in its output."
@ -813,7 +813,7 @@
   "metadata": {},
   "source": [
    "#### Returning sources\n",
-    "Often in Q&A applications it's important to show users the sources that were used to generate the answer. LangChain's built-in `create_retrieval_chain` will propagate retrieved source documents through to the output in the `\"context\"` key:"
+    "Often in Q&A applications it's important to show users the sources that were used to generate the answer. LangChain's built-in `create_retrieval_chain` will propagate retrieved source documents to the output under the `\"context\"` key:"
   ]
  },
  {
@ -940,7 +940,7 @@
    "- Generating an answer using the retrieved chunks as context\n",
    "\n",
    "There’s plenty of features, integrations, and extensions to explore in each of\n",
-    "the above sections. Along from the **Go deeper** sources mentioned\n",
+    "the above sections. Along with the **Go deeper** sources mentioned\n",
    "above, good next steps include:\n",
    "\n",
    "- [Return sources](/docs/how_to/qa_sources): Learn how to return source documents\n",