diff --git a/docs/docs/integrations/document_loaders/amazon_textract.ipynb b/docs/docs/integrations/document_loaders/amazon_textract.ipynb index 71da1059ea1..9eb475180be 100644 --- a/docs/docs/integrations/document_loaders/amazon_textract.ipynb +++ b/docs/docs/integrations/document_loaders/amazon_textract.ipynb @@ -13,7 +13,7 @@ "\n", "`Textract` supports `JPEG`, `PNG`, `PDF`, and `TIFF` file formats; more information is available in [the documentation](https://docs.aws.amazon.com/textract/latest/dg/limits-document.html).\n", "\n", - "The following samples demonstrate the use of `Amazon Textract` in combination with LangChain as a DocumentLoader." + "The following examples demonstrate the use of `Amazon Textract` in combination with LangChain as a DocumentLoader." ] }, { @@ -41,7 +41,7 @@ "id": "400b25c6-befa-4730-a201-39ff112c8858", "metadata": {}, "source": [ - "## Sample 1\n", + "## Example 1: Loading from a local file\n", "\n", "The first example uses a local file, which internally will be sent to Amazon Textract sync API [DetectDocumentText](https://docs.aws.amazon.com/textract/latest/dg/API_DetectDocumentText.html). \n", "\n", @@ -100,8 +100,8 @@ "id": "4cf7f19c-3635-453a-9c76-4baf98b8d7f4", "metadata": {}, "source": [ - "## Sample 2\n", - "The next sample loads a file from an HTTPS endpoint. \n", + "## Example 2: Loading from a URL\n", + "The next example loads a file from an HTTPS endpoint. \n", "It has to be single page, as Amazon Textract requires all multi-page documents to be stored on S3." ] }, @@ -150,7 +150,7 @@ "id": "3a9cd8ec-e663-4dc7-9db1-d2f575253141", "metadata": {}, "source": [ - "## Sample 3\n", + "## Example 3: Loading multi-page PDF documents\n", "\n", "Processing a multi-page document requires the document to be on S3. The sample document resides in a bucket in us-east-2 and Textract needs to be called in that same region to be successful, so we set the region_name on the client and pass that in to the loader to ensure Textract is called from us-east-2. You could also to have your notebook running in us-east-2, setting the AWS_DEFAULT_REGION set to us-east-2 or when running in a different environment, pass in a boto3 Textract client with that region name like in the cell below." ] @@ -214,7 +214,7 @@ } }, "source": [ - "## Sample 4\n", + "## Example 4: Customizing the output format\n", "\n", "You have the option to pass an additional parameter called `linearization_config` to the AmazonTextractPDFLoader which will determine how the text output will be linearized by the parser after Textract runs." ] @@ -248,7 +248,7 @@ "## Using the AmazonTextractPDFLoader in a LangChain chain (e.g. OpenAI)\n", "\n", "The AmazonTextractPDFLoader can be used in a chain the same way the other loaders are used.\n", - "Textract itself does have a [Query feature](https://docs.aws.amazon.com/textract/latest/dg/API_Query.html), which offers similar functionality to the QA chain in this sample, which is worth checking out as well." + "Textract itself does have a [Query feature](https://docs.aws.amazon.com/textract/latest/dg/API_Query.html), which offers similar functionality to the QA chain in this example, which is worth checking out as well." ] }, {