diff --git a/docs/docs/integrations/document_loaders/azure_ai_data.ipynb b/docs/docs/integrations/document_loaders/azure_ai_data.ipynb index 45750dcc7a8..bdda38d0bd0 100644 --- a/docs/docs/integrations/document_loaders/azure_ai_data.ipynb +++ b/docs/docs/integrations/document_loaders/azure_ai_data.ipynb @@ -8,10 +8,10 @@ "# Azure AI Data\n", "\n", ">[Azure AI Studio](https://ai.azure.com/) provides the capability to upload data assets to cloud storage and register existing data assets from the following sources:\n", - "\n", - "- Microsoft OneLake\n", - "- Azure Blob Storage\n", - "- Azure Data Lake gen 2\n", + ">\n", + ">- `Microsoft OneLake`\n", + ">- `Azure Blob Storage`\n", + ">- `Azure Data Lake gen 2`\n", "\n", "The benefit of this approach over `AzureBlobStorageContainerLoader` and `AzureBlobStorageFileLoader` is that authentication is handled seamlessly to cloud storage. You can use either *identity-based* data access control to the data or *credential-based* (e.g. SAS token, account key). In the case of credential-based data access you do not need to specify secrets in your code or set up key vaults - the system handles that for you.\n", "\n", @@ -166,7 +166,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.6" + "version": "3.10.12" } }, "nbformat": 4, diff --git a/docs/docs/integrations/document_loaders/azure_document_intelligence.ipynb b/docs/docs/integrations/document_loaders/azure_document_intelligence.ipynb index dc68783d3d9..3f6e6038777 100644 --- a/docs/docs/integrations/document_loaders/azure_document_intelligence.ipynb +++ b/docs/docs/integrations/document_loaders/azure_document_intelligence.ipynb @@ -13,22 +13,31 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Azure AI Document Intelligence (formerly known as Azure Form Recognizer) is machine-learning \n", - "based service that extracts text (including handwriting), tables or key-value-pairs from\n", - "scanned documents or images.\n", + ">[Azure AI Document Intelligence](https://aka.ms/doc-intelligence) (formerly known as `Azure Form Recognizer`) is machine-learning \n", + ">based service that extracts text (including handwriting), tables or key-value-pairs from\n", + ">scanned documents or images.\n", + ">\n", + ">Document Intelligence supports `PDF`, `JPEG`, `PNG`, `BMP`, or `TIFF`.\n", "\n", - "This current implementation of a loader using Document Intelligence is able to incorporate content page-wise and turn it into LangChain documents.\n", - "\n", - "Document Intelligence supports PDF, JPEG, PNG, BMP, or TIFF.\n", - "\n", - "Further documentation is available at https://aka.ms/doc-intelligence.\n" + "This current implementation of a loader using `Document Intelligence` can incorporate content page-wise and turn it into LangChain documents.\n" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 1, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.3.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.3.2\u001b[0m\n", + "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3 -m pip install --upgrade pip\u001b[0m\n", + "Note: you may need to restart the kernel to use updated packages.\n" + ] + } + ], "source": [ "%pip install langchain langchain-community azure-ai-documentintelligence -q" ] @@ -126,7 +135,7 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3", + "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, @@ -140,7 +149,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.10" + "version": "3.10.12" }, "vscode": { "interpreter": { diff --git a/docs/docs/integrations/platforms/microsoft.mdx b/docs/docs/integrations/platforms/microsoft.mdx index 8fde47d1103..6c616dc20e6 100644 --- a/docs/docs/integrations/platforms/microsoft.mdx +++ b/docs/docs/integrations/platforms/microsoft.mdx @@ -49,6 +49,50 @@ from langchain_community.llms import AzureOpenAI ## Document loaders +### Azure AI Data + +>[Azure AI Studio](https://ai.azure.com/) provides the capability to upload data assets +> to cloud storage and register existing data assets from the following sources: +> +>- `Microsoft OneLake` +>- `Azure Blob Storage` +>- `Azure Data Lake gen 2` + +First, you need to install several python packages. + +```bash +pip install azureml-fsspec, azure-ai-generative +``` + +See a [usage example](/docs/integrations/document_loaders/azure_ai_data). + +```python +from langchain.document_loaders import AzureAIDataLoader +``` + + +### Azure AI Document Intelligence + +>[Azure AI Document Intelligence](https://aka.ms/doc-intelligence) (formerly known +> as `Azure Form Recognizer`) is machine-learning +> based service that extracts text (including handwriting), tables or key-value-pairs +> from scanned documents or images. +> +>Document Intelligence supports `PDF`, `JPEG`, `PNG`, `BMP`, or `TIFF`. + +First, you need to install a python package. + +```bash +pip install azure-ai-documentintelligence +``` + +See a [usage example](/docs/integrations/document_loaders/azure_document_intelligence). + +```python +from langchain.document_loaders import AzureAIDocumentIntelligenceLoader +``` + + ### Azure Blob Storage >[Azure Blob Storage](https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction) is Microsoft's object storage solution for the cloud. Blob Storage is optimized for storing massive amounts of unstructured data. Unstructured data is data that doesn't adhere to a particular data model or definition, such as text or binary data.