docs: update titles for Textract examples (#32063)

**On this PR I am doing two things:**

1. Adding titles to the 4 example we have, to allow the reader to
capture the essence of the paragraph quickly
2. Replacing 'samples' with 'examples', for more clarity, 

**Why 'examples' could be a better terminology over 'samples' here?**
1. On the page, we were using both 'samples' and 'examples'
interchangeably which lead to confusion, now 'examples' are the use
cases, while 'samples' are the the sample data being used
2. This is consistent with the rest of the docs, we typically use
'examples' for examples, for example
https://python.langchain.com/docs/integrations/callbacks/fiddler/
This commit is contained in:
Ahmad Elmalah 2025-07-16 17:17:02 +03:00 committed by GitHub
parent ad44f0688b
commit 2ab2cab203
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -13,7 +13,7 @@
"\n",
"`Textract` supports `JPEG`, `PNG`, `PDF`, and `TIFF` file formats; more information is available in [the documentation](https://docs.aws.amazon.com/textract/latest/dg/limits-document.html).\n",
"\n",
"The following samples demonstrate the use of `Amazon Textract` in combination with LangChain as a DocumentLoader."
"The following examples demonstrate the use of `Amazon Textract` in combination with LangChain as a DocumentLoader."
]
},
{
@ -41,7 +41,7 @@
"id": "400b25c6-befa-4730-a201-39ff112c8858",
"metadata": {},
"source": [
"## Sample 1\n",
"## Example 1: Loading from a local file\n",
"\n",
"The first example uses a local file, which internally will be sent to Amazon Textract sync API [DetectDocumentText](https://docs.aws.amazon.com/textract/latest/dg/API_DetectDocumentText.html). \n",
"\n",
@ -100,8 +100,8 @@
"id": "4cf7f19c-3635-453a-9c76-4baf98b8d7f4",
"metadata": {},
"source": [
"## Sample 2\n",
"The next sample loads a file from an HTTPS endpoint. \n",
"## Example 2: Loading from a URL\n",
"The next example loads a file from an HTTPS endpoint. \n",
"It has to be single page, as Amazon Textract requires all multi-page documents to be stored on S3."
]
},
@ -150,7 +150,7 @@
"id": "3a9cd8ec-e663-4dc7-9db1-d2f575253141",
"metadata": {},
"source": [
"## Sample 3\n",
"## Example 3: Loading multi-page PDF documents\n",
"\n",
"Processing a multi-page document requires the document to be on S3. The sample document resides in a bucket in us-east-2 and Textract needs to be called in that same region to be successful, so we set the region_name on the client and pass that in to the loader to ensure Textract is called from us-east-2. You could also to have your notebook running in us-east-2, setting the AWS_DEFAULT_REGION set to us-east-2 or when running in a different environment, pass in a boto3 Textract client with that region name like in the cell below."
]
@ -214,7 +214,7 @@
}
},
"source": [
"## Sample 4\n",
"## Example 4: Customizing the output format\n",
"\n",
"You have the option to pass an additional parameter called `linearization_config` to the AmazonTextractPDFLoader which will determine how the text output will be linearized by the parser after Textract runs."
]
@ -248,7 +248,7 @@
"## Using the AmazonTextractPDFLoader in a LangChain chain (e.g. OpenAI)\n",
"\n",
"The AmazonTextractPDFLoader can be used in a chain the same way the other loaders are used.\n",
"Textract itself does have a [Query feature](https://docs.aws.amazon.com/textract/latest/dg/API_Query.html), which offers similar functionality to the QA chain in this sample, which is worth checking out as well."
"Textract itself does have a [Query feature](https://docs.aws.amazon.com/textract/latest/dg/API_Query.html), which offers similar functionality to the QA chain in this example, which is worth checking out as well."
]
},
{