mirror of
https://github.com/hwchase17/langchain.git
synced 2025-06-29 01:48:57 +00:00
docs Microsoft
platform page update (#15420)
Added two new document_loader references. Improved the format consistency of the example pages
This commit is contained in:
parent
b8c6ebf647
commit
1e6519edc2
@ -8,10 +8,10 @@
|
||||
"# Azure AI Data\n",
|
||||
"\n",
|
||||
">[Azure AI Studio](https://ai.azure.com/) provides the capability to upload data assets to cloud storage and register existing data assets from the following sources:\n",
|
||||
"\n",
|
||||
"- Microsoft OneLake\n",
|
||||
"- Azure Blob Storage\n",
|
||||
"- Azure Data Lake gen 2\n",
|
||||
">\n",
|
||||
">- `Microsoft OneLake`\n",
|
||||
">- `Azure Blob Storage`\n",
|
||||
">- `Azure Data Lake gen 2`\n",
|
||||
"\n",
|
||||
"The benefit of this approach over `AzureBlobStorageContainerLoader` and `AzureBlobStorageFileLoader` is that authentication is handled seamlessly to cloud storage. You can use either *identity-based* data access control to the data or *credential-based* (e.g. SAS token, account key). In the case of credential-based data access you do not need to specify secrets in your code or set up key vaults - the system handles that for you.\n",
|
||||
"\n",
|
||||
@ -166,7 +166,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.6"
|
||||
"version": "3.10.12"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
@ -13,22 +13,31 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Azure AI Document Intelligence (formerly known as Azure Form Recognizer) is machine-learning \n",
|
||||
"based service that extracts text (including handwriting), tables or key-value-pairs from\n",
|
||||
"scanned documents or images.\n",
|
||||
">[Azure AI Document Intelligence](https://aka.ms/doc-intelligence) (formerly known as `Azure Form Recognizer`) is machine-learning \n",
|
||||
">based service that extracts text (including handwriting), tables or key-value-pairs from\n",
|
||||
">scanned documents or images.\n",
|
||||
">\n",
|
||||
">Document Intelligence supports `PDF`, `JPEG`, `PNG`, `BMP`, or `TIFF`.\n",
|
||||
"\n",
|
||||
"This current implementation of a loader using Document Intelligence is able to incorporate content page-wise and turn it into LangChain documents.\n",
|
||||
"\n",
|
||||
"Document Intelligence supports PDF, JPEG, PNG, BMP, or TIFF.\n",
|
||||
"\n",
|
||||
"Further documentation is available at https://aka.ms/doc-intelligence.\n"
|
||||
"This current implementation of a loader using `Document Intelligence` can incorporate content page-wise and turn it into LangChain documents.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.3.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.3.2\u001b[0m\n",
|
||||
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3 -m pip install --upgrade pip\u001b[0m\n",
|
||||
"Note: you may need to restart the kernel to use updated packages.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%pip install langchain langchain-community azure-ai-documentintelligence -q"
|
||||
]
|
||||
@ -126,7 +135,7 @@
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
@ -140,7 +149,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.10"
|
||||
"version": "3.10.12"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
|
@ -49,6 +49,50 @@ from langchain_community.llms import AzureOpenAI
|
||||
|
||||
## Document loaders
|
||||
|
||||
### Azure AI Data
|
||||
|
||||
>[Azure AI Studio](https://ai.azure.com/) provides the capability to upload data assets
|
||||
> to cloud storage and register existing data assets from the following sources:
|
||||
>
|
||||
>- `Microsoft OneLake`
|
||||
>- `Azure Blob Storage`
|
||||
>- `Azure Data Lake gen 2`
|
||||
|
||||
First, you need to install several python packages.
|
||||
|
||||
```bash
|
||||
pip install azureml-fsspec, azure-ai-generative
|
||||
```
|
||||
|
||||
See a [usage example](/docs/integrations/document_loaders/azure_ai_data).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import AzureAIDataLoader
|
||||
```
|
||||
|
||||
|
||||
### Azure AI Document Intelligence
|
||||
|
||||
>[Azure AI Document Intelligence](https://aka.ms/doc-intelligence) (formerly known
|
||||
> as `Azure Form Recognizer`) is machine-learning
|
||||
> based service that extracts text (including handwriting), tables or key-value-pairs
|
||||
> from scanned documents or images.
|
||||
>
|
||||
>Document Intelligence supports `PDF`, `JPEG`, `PNG`, `BMP`, or `TIFF`.
|
||||
|
||||
First, you need to install a python package.
|
||||
|
||||
```bash
|
||||
pip install azure-ai-documentintelligence
|
||||
```
|
||||
|
||||
See a [usage example](/docs/integrations/document_loaders/azure_document_intelligence).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import AzureAIDocumentIntelligenceLoader
|
||||
```
|
||||
|
||||
|
||||
### Azure Blob Storage
|
||||
|
||||
>[Azure Blob Storage](https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction) is Microsoft's object storage solution for the cloud. Blob Storage is optimized for storing massive amounts of unstructured data. Unstructured data is data that doesn't adhere to a particular data model or definition, such as text or binary data.
|
||||
|
Loading…
Reference in New Issue
Block a user