mirror of
https://github.com/hwchase17/langchain.git
synced 2025-07-19 19:11:33 +00:00
docs: update titles for Textract examples (#32063)
**On this PR I am doing two things:** 1. Adding titles to the 4 example we have, to allow the reader to capture the essence of the paragraph quickly 2. Replacing 'samples' with 'examples', for more clarity, **Why 'examples' could be a better terminology over 'samples' here?** 1. On the page, we were using both 'samples' and 'examples' interchangeably which lead to confusion, now 'examples' are the use cases, while 'samples' are the the sample data being used 2. This is consistent with the rest of the docs, we typically use 'examples' for examples, for example https://python.langchain.com/docs/integrations/callbacks/fiddler/
This commit is contained in:
parent
ad44f0688b
commit
2ab2cab203
@ -13,7 +13,7 @@
|
||||
"\n",
|
||||
"`Textract` supports `JPEG`, `PNG`, `PDF`, and `TIFF` file formats; more information is available in [the documentation](https://docs.aws.amazon.com/textract/latest/dg/limits-document.html).\n",
|
||||
"\n",
|
||||
"The following samples demonstrate the use of `Amazon Textract` in combination with LangChain as a DocumentLoader."
|
||||
"The following examples demonstrate the use of `Amazon Textract` in combination with LangChain as a DocumentLoader."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -41,7 +41,7 @@
|
||||
"id": "400b25c6-befa-4730-a201-39ff112c8858",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Sample 1\n",
|
||||
"## Example 1: Loading from a local file\n",
|
||||
"\n",
|
||||
"The first example uses a local file, which internally will be sent to Amazon Textract sync API [DetectDocumentText](https://docs.aws.amazon.com/textract/latest/dg/API_DetectDocumentText.html). \n",
|
||||
"\n",
|
||||
@ -100,8 +100,8 @@
|
||||
"id": "4cf7f19c-3635-453a-9c76-4baf98b8d7f4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Sample 2\n",
|
||||
"The next sample loads a file from an HTTPS endpoint. \n",
|
||||
"## Example 2: Loading from a URL\n",
|
||||
"The next example loads a file from an HTTPS endpoint. \n",
|
||||
"It has to be single page, as Amazon Textract requires all multi-page documents to be stored on S3."
|
||||
]
|
||||
},
|
||||
@ -150,7 +150,7 @@
|
||||
"id": "3a9cd8ec-e663-4dc7-9db1-d2f575253141",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Sample 3\n",
|
||||
"## Example 3: Loading multi-page PDF documents\n",
|
||||
"\n",
|
||||
"Processing a multi-page document requires the document to be on S3. The sample document resides in a bucket in us-east-2 and Textract needs to be called in that same region to be successful, so we set the region_name on the client and pass that in to the loader to ensure Textract is called from us-east-2. You could also to have your notebook running in us-east-2, setting the AWS_DEFAULT_REGION set to us-east-2 or when running in a different environment, pass in a boto3 Textract client with that region name like in the cell below."
|
||||
]
|
||||
@ -214,7 +214,7 @@
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"## Sample 4\n",
|
||||
"## Example 4: Customizing the output format\n",
|
||||
"\n",
|
||||
"You have the option to pass an additional parameter called `linearization_config` to the AmazonTextractPDFLoader which will determine how the text output will be linearized by the parser after Textract runs."
|
||||
]
|
||||
@ -248,7 +248,7 @@
|
||||
"## Using the AmazonTextractPDFLoader in a LangChain chain (e.g. OpenAI)\n",
|
||||
"\n",
|
||||
"The AmazonTextractPDFLoader can be used in a chain the same way the other loaders are used.\n",
|
||||
"Textract itself does have a [Query feature](https://docs.aws.amazon.com/textract/latest/dg/API_Query.html), which offers similar functionality to the QA chain in this sample, which is worth checking out as well."
|
||||
"Textract itself does have a [Query feature](https://docs.aws.amazon.com/textract/latest/dg/API_Query.html), which offers similar functionality to the QA chain in this example, which is worth checking out as well."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
Loading…
Reference in New Issue
Block a user