docs: update Textract docs (#31992)

I am modifying two things:

1. "This sample demonstrates" with "The following samples demonstrate"
as we're talking about at least 4 samples
2. Bringing the sentence to after talking about the definition of
textract to keep the document organized (textract definition then
samples)

---------

Co-authored-by: Mason Daugherty <github@mdrxy.com>
This commit is contained in:
Ahmad Elmalah 2025-07-14 18:36:29 +03:00 committed by GitHub
parent 553ac1863b
commit 2fdccd789c
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -11,11 +11,9 @@
">\n",
">It goes beyond simple optical character recognition (OCR) to identify, understand, and extract data from forms and tables. Today, many companies manually extract data from scanned documents such as PDFs, images, tables, and forms, or through simple OCR software that requires manual configuration (which often must be updated when the form changes). To overcome these manual and expensive processes, `Textract` uses ML to read and process any type of document, accurately extracting text, handwriting, tables, and other data with no manual effort. \n",
"\n",
"This sample demonstrates the use of `Amazon Textract` in combination with LangChain as a DocumentLoader.\n",
"`Textract` supports `JPEG`, `PNG`, `PDF`, and `TIFF` file formats; more information is available in [the documentation](https://docs.aws.amazon.com/textract/latest/dg/limits-document.html).\n",
"\n",
"`Textract` supports`PDF`, `TIFF`, `PNG` and `JPEG` format.\n",
"\n",
"`Textract` supports these [document sizes, languages and characters](https://docs.aws.amazon.com/textract/latest/dg/limits-document.html)."
"The following samples demonstrate the use of `Amazon Textract` in combination with LangChain as a DocumentLoader."
]
},
{
@ -310,17 +308,6 @@
"\n",
"chain.run(input_documents=documents, question=query)"
]
},
{
"cell_type": "markdown",
"id": "bd97f1c90aff6a83",
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": []
}
],
"metadata": {