docs Microsoft platform page update (#15420)

Added two new document_loader references. Improved the format
consistency of the example pages
This commit is contained in:
Leonid Ganeline 2024-01-02 14:59:40 -08:00 committed by GitHub
parent b8c6ebf647
commit 1e6519edc2
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
3 changed files with 70 additions and 17 deletions

View File

@ -8,10 +8,10 @@
"# Azure AI Data\n",
"\n",
">[Azure AI Studio](https://ai.azure.com/) provides the capability to upload data assets to cloud storage and register existing data assets from the following sources:\n",
"\n",
"- Microsoft OneLake\n",
"- Azure Blob Storage\n",
"- Azure Data Lake gen 2\n",
">\n",
">- `Microsoft OneLake`\n",
">- `Azure Blob Storage`\n",
">- `Azure Data Lake gen 2`\n",
"\n",
"The benefit of this approach over `AzureBlobStorageContainerLoader` and `AzureBlobStorageFileLoader` is that authentication is handled seamlessly to cloud storage. You can use either *identity-based* data access control to the data or *credential-based* (e.g. SAS token, account key). In the case of credential-based data access you do not need to specify secrets in your code or set up key vaults - the system handles that for you.\n",
"\n",
@ -166,7 +166,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.6"
"version": "3.10.12"
}
},
"nbformat": 4,

View File

@ -13,22 +13,31 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Azure AI Document Intelligence (formerly known as Azure Form Recognizer) is machine-learning \n",
"based service that extracts text (including handwriting), tables or key-value-pairs from\n",
"scanned documents or images.\n",
">[Azure AI Document Intelligence](https://aka.ms/doc-intelligence) (formerly known as `Azure Form Recognizer`) is machine-learning \n",
">based service that extracts text (including handwriting), tables or key-value-pairs from\n",
">scanned documents or images.\n",
">\n",
">Document Intelligence supports `PDF`, `JPEG`, `PNG`, `BMP`, or `TIFF`.\n",
"\n",
"This current implementation of a loader using Document Intelligence is able to incorporate content page-wise and turn it into LangChain documents.\n",
"\n",
"Document Intelligence supports PDF, JPEG, PNG, BMP, or TIFF.\n",
"\n",
"Further documentation is available at https://aka.ms/doc-intelligence.\n"
"This current implementation of a loader using `Document Intelligence` can incorporate content page-wise and turn it into LangChain documents.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 1,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.3.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.3.2\u001b[0m\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3 -m pip install --upgrade pip\u001b[0m\n",
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"%pip install langchain langchain-community azure-ai-documentintelligence -q"
]
@ -126,7 +135,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
@ -140,7 +149,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
"version": "3.10.12"
},
"vscode": {
"interpreter": {

View File

@ -49,6 +49,50 @@ from langchain_community.llms import AzureOpenAI
## Document loaders
### Azure AI Data
>[Azure AI Studio](https://ai.azure.com/) provides the capability to upload data assets
> to cloud storage and register existing data assets from the following sources:
>
>- `Microsoft OneLake`
>- `Azure Blob Storage`
>- `Azure Data Lake gen 2`
First, you need to install several python packages.
```bash
pip install azureml-fsspec, azure-ai-generative
```
See a [usage example](/docs/integrations/document_loaders/azure_ai_data).
```python
from langchain.document_loaders import AzureAIDataLoader
```
### Azure AI Document Intelligence
>[Azure AI Document Intelligence](https://aka.ms/doc-intelligence) (formerly known
> as `Azure Form Recognizer`) is machine-learning
> based service that extracts text (including handwriting), tables or key-value-pairs
> from scanned documents or images.
>
>Document Intelligence supports `PDF`, `JPEG`, `PNG`, `BMP`, or `TIFF`.
First, you need to install a python package.
```bash
pip install azure-ai-documentintelligence
```
See a [usage example](/docs/integrations/document_loaders/azure_document_intelligence).
```python
from langchain.document_loaders import AzureAIDocumentIntelligenceLoader
```
### Azure Blob Storage
>[Azure Blob Storage](https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction) is Microsoft's object storage solution for the cloud. Blob Storage is optimized for storing massive amounts of unstructured data. Unstructured data is data that doesn't adhere to a particular data model or definition, such as text or binary data.