From d40bdd6257ff49beab0d10998fd94513ee80bfd7 Mon Sep 17 00:00:00 2001 From: Isaac Francisco <78627776+isahers1@users.noreply.github.com> Date: Tue, 20 Aug 2024 10:54:42 -0700 Subject: [PATCH] docs: more indexing of document loaders (#25500) Co-authored-by: Bagatur Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> --- .../integrations/document_loaders/index.mdx | 24 ++ .../document_loaders/notion.ipynb | 179 ++++++++++-- .../document_loaders/notiondb.ipynb | 161 ----------- .../integrations/document_loaders/xml.ipynb | 163 ++++++++++- docs/docs/integrations/providers/notion.mdx | 11 +- docs/src/theme/FeatureTables.js | 266 ++++++++++++++++++ docs/vercel.json | 4 + 7 files changed, 611 insertions(+), 197 deletions(-) delete mode 100644 docs/docs/integrations/document_loaders/notiondb.ipynb diff --git a/docs/docs/integrations/document_loaders/index.mdx b/docs/docs/integrations/document_loaders/index.mdx index 21076def823..6dee9374e97 100644 --- a/docs/docs/integrations/document_loaders/index.mdx +++ b/docs/docs/integrations/document_loaders/index.mdx @@ -33,6 +33,30 @@ The below document loaders allow you to load PDF documents. +## Cloud Providers + +The below document loaders allow you to load documents from your favorite cloud providers. + + + +## Social Platforms + +The below document loaders allow you to load documents from differnt social media platforms. + + + +## Messaging Services + +The below document loaders allow you to load data from different messaging platforms. + + + +## Productivity tools + +The below document loaders allow you to load data from commonly used productivity tools. + + + ## Common File Types The below document loaders allow you to load data from common data formats. diff --git a/docs/docs/integrations/document_loaders/notion.ipynb b/docs/docs/integrations/document_loaders/notion.ipynb index 1c81e3765c3..dc2c5fff53e 100644 --- a/docs/docs/integrations/document_loaders/notion.ipynb +++ b/docs/docs/integrations/document_loaders/notion.ipynb @@ -1,19 +1,170 @@ { "cells": [ { + "attachments": {}, "cell_type": "markdown", "id": "1dc7df1d", "metadata": {}, "source": [ - "# Notion DB 1/2\n", + "# Notion DB 2/2\n", "\n", ">[Notion](https://www.notion.so/) is a collaboration platform with modified Markdown support that integrates kanban boards, tasks, wikis and databases. It is an all-in-one workspace for notetaking, knowledge and data management, and project and task management.\n", "\n", - "This notebook covers how to load documents from a Notion database dump.\n", + "`NotionDBLoader` is a Python class for loading content from a `Notion` database. It retrieves pages from the database, reads their content, and returns a list of Document objects. `NotionDirectoryLoader` is used for loading data from a Notion database dump.\n", "\n", - "In order to get this notion dump, follow these instructions:\n", + "## Requirements\n", "\n", - "## 🧑 Instructions for ingesting your own dataset\n", + "- A `Notion` Database\n", + "- Notion Integration Token\n", + "\n", + "## Setup\n", + "\n", + "### 1. Create a Notion Table Database\n", + "Create a new table database in Notion. You can add any column to the database and they will be treated as metadata. For example you can add the following columns:\n", + "\n", + "- Title: set Title as the default property.\n", + "- Categories: A Multi-select property to store categories associated with the page.\n", + "- Keywords: A Multi-select property to store keywords associated with the page.\n", + "\n", + "Add your content to the body of each page in the database. The NotionDBLoader will extract the content and metadata from these pages.\n", + "\n", + "## 2. Create a Notion Integration\n", + "To create a Notion Integration, follow these steps:\n", + "\n", + "1. Visit the [Notion Developers](https://www.notion.com/my-integrations) page and log in with your Notion account.\n", + "2. Click on the \"+ New integration\" button.\n", + "3. Give your integration a name and choose the workspace where your database is located.\n", + "4. Select the require capabilities, this extension only need the Read content capability\n", + "5. Click the \"Submit\" button to create the integration.\n", + "Once the integration is created, you'll be provided with an `Integration Token (API key)`. Copy this token and keep it safe, as you'll need it to use the NotionDBLoader.\n", + "\n", + "### 3. Connect the Integration to the Database\n", + "To connect your integration to the database, follow these steps:\n", + "\n", + "1. Open your database in Notion.\n", + "2. Click on the three-dot menu icon in the top right corner of the database view.\n", + "3. Click on the \"+ New integration\" button.\n", + "4. Find your integration, you may need to start typing its name in the search box.\n", + "5. Click on the \"Connect\" button to connect the integration to the database.\n", + "\n", + "\n", + "### 4. Get the Database ID\n", + "To get the database ID, follow these steps:\n", + "\n", + "1. Open your database in Notion.\n", + "2. Click on the three-dot menu icon in the top right corner of the database view.\n", + "3. Select \"Copy link\" from the menu to copy the database URL to your clipboard.\n", + "4. The database ID is the long string of alphanumeric characters found in the URL. It typically looks like this: https://www.notion.so/username/8935f9d140a04f95a872520c4f123456?v=.... In this example, the database ID is 8935f9d140a04f95a872520c4f123456.\n", + "\n", + "With the database properly set up and the integration token and database ID in hand, you can now use the NotionDBLoader code to load content and metadata from your Notion database.\n", + "\n", + "### 5. Installation\n", + "\n", + "Instaall the `langchain-community` integration package.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "412b38dc", + "metadata": {}, + "outputs": [], + "source": [ + "%pip install -qU langchain-community" + ] + }, + { + "cell_type": "markdown", + "id": "cced2931", + "metadata": {}, + "source": [ + "\n", + "## Notion Database Loader\n", + "NotionDBLoader is part of the langchain package's document loaders. You can use it as follows:" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "6c3a314c", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "········\n", + "········\n" + ] + } + ], + "source": [ + "from getpass import getpass\n", + "\n", + "NOTION_TOKEN = getpass()\n", + "DATABASE_ID = getpass()" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "007c5cbf", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_community.document_loaders import NotionDBLoader" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "a1caec59", + "metadata": {}, + "outputs": [], + "source": [ + "loader = NotionDBLoader(\n", + " integration_token=NOTION_TOKEN,\n", + " database_id=DATABASE_ID,\n", + " request_timeout_sec=30, # optional, defaults to 10\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "b1c30ff7", + "metadata": {}, + "outputs": [], + "source": [ + "docs = loader.load()" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "4f5789a2", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + } + ], + "source": [ + "print(docs)" + ] + }, + { + "cell_type": "markdown", + "id": "2b87ab5c", + "metadata": {}, + "source": [ + "## Notion Directory Loader\n", + "\n", + "### Setup\n", "\n", "Export your dataset from Notion. You can do this by clicking on the three dots in the upper right hand corner and then clicking `Export`.\n", "\n", @@ -27,33 +178,27 @@ "unzip Export-d3adfe0f-3131-4bf3-8987-a52017fc1bae.zip -d Notion_DB\n", "```\n", "\n", - "Run the following command to ingest the data." - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "id": "007c5cbf", - "metadata": {}, - "outputs": [], - "source": [ - "from langchain_community.document_loaders import NotionDirectoryLoader" + "### Usage\n", + "\n", + "Run the following command to ingest the data you just downloaded." ] }, { "cell_type": "code", "execution_count": null, - "id": "a1caec59", + "id": "9debffdd", "metadata": {}, "outputs": [], "source": [ + "from langchain_community.document_loaders import NotionDirectoryLoader\n", + "\n", "loader = NotionDirectoryLoader(\"Notion_DB\")" ] }, { "cell_type": "code", "execution_count": null, - "id": "b1c30ff7", + "id": "81008087", "metadata": {}, "outputs": [], "source": [ diff --git a/docs/docs/integrations/document_loaders/notiondb.ipynb b/docs/docs/integrations/document_loaders/notiondb.ipynb deleted file mode 100644 index d612728f6ad..00000000000 --- a/docs/docs/integrations/document_loaders/notiondb.ipynb +++ /dev/null @@ -1,161 +0,0 @@ -{ - "cells": [ - { - "attachments": {}, - "cell_type": "markdown", - "id": "1dc7df1d", - "metadata": {}, - "source": [ - "# Notion DB 2/2\n", - "\n", - ">[Notion](https://www.notion.so/) is a collaboration platform with modified Markdown support that integrates kanban boards, tasks, wikis and databases. It is an all-in-one workspace for notetaking, knowledge and data management, and project and task management.\n", - "\n", - "`NotionDBLoader` is a Python class for loading content from a `Notion` database. It retrieves pages from the database, reads their content, and returns a list of Document objects.\n", - "\n", - "## Requirements\n", - "\n", - "- A `Notion` Database\n", - "- Notion Integration Token\n", - "\n", - "## Setup\n", - "\n", - "### 1. Create a Notion Table Database\n", - "Create a new table database in Notion. You can add any column to the database and they will be treated as metadata. For example you can add the following columns:\n", - "\n", - "- Title: set Title as the default property.\n", - "- Categories: A Multi-select property to store categories associated with the page.\n", - "- Keywords: A Multi-select property to store keywords associated with the page.\n", - "\n", - "Add your content to the body of each page in the database. The NotionDBLoader will extract the content and metadata from these pages.\n", - "\n", - "## 2. Create a Notion Integration\n", - "To create a Notion Integration, follow these steps:\n", - "\n", - "1. Visit the [Notion Developers](https://www.notion.com/my-integrations) page and log in with your Notion account.\n", - "2. Click on the \"+ New integration\" button.\n", - "3. Give your integration a name and choose the workspace where your database is located.\n", - "4. Select the require capabilities, this extension only need the Read content capability\n", - "5. Click the \"Submit\" button to create the integration.\n", - "Once the integration is created, you'll be provided with an `Integration Token (API key)`. Copy this token and keep it safe, as you'll need it to use the NotionDBLoader.\n", - "\n", - "### 3. Connect the Integration to the Database\n", - "To connect your integration to the database, follow these steps:\n", - "\n", - "1. Open your database in Notion.\n", - "2. Click on the three-dot menu icon in the top right corner of the database view.\n", - "3. Click on the \"+ New integration\" button.\n", - "4. Find your integration, you may need to start typing its name in the search box.\n", - "5. Click on the \"Connect\" button to connect the integration to the database.\n", - "\n", - "\n", - "### 4. Get the Database ID\n", - "To get the database ID, follow these steps:\n", - "\n", - "1. Open your database in Notion.\n", - "2. Click on the three-dot menu icon in the top right corner of the database view.\n", - "3. Select \"Copy link\" from the menu to copy the database URL to your clipboard.\n", - "4. The database ID is the long string of alphanumeric characters found in the URL. It typically looks like this: https://www.notion.so/username/8935f9d140a04f95a872520c4f123456?v=.... In this example, the database ID is 8935f9d140a04f95a872520c4f123456.\n", - "\n", - "With the database properly set up and the integration token and database ID in hand, you can now use the NotionDBLoader code to load content and metadata from your Notion database.\n", - "\n", - "## Usage\n", - "NotionDBLoader is part of the langchain package's document loaders. You can use it as follows:" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "id": "6c3a314c", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "········\n", - "········\n" - ] - } - ], - "source": [ - "from getpass import getpass\n", - "\n", - "NOTION_TOKEN = getpass()\n", - "DATABASE_ID = getpass()" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "id": "007c5cbf", - "metadata": {}, - "outputs": [], - "source": [ - "from langchain_community.document_loaders import NotionDBLoader" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "id": "a1caec59", - "metadata": {}, - "outputs": [], - "source": [ - "loader = NotionDBLoader(\n", - " integration_token=NOTION_TOKEN,\n", - " database_id=DATABASE_ID,\n", - " request_timeout_sec=30, # optional, defaults to 10\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "id": "b1c30ff7", - "metadata": {}, - "outputs": [], - "source": [ - "docs = loader.load()" - ] - }, - { - "cell_type": "code", - "execution_count": 17, - "id": "4f5789a2", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - } - ], - "source": [ - "print(docs)" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.6" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/docs/docs/integrations/document_loaders/xml.ipynb b/docs/docs/integrations/document_loaders/xml.ipynb index 55f2f14c640..0e28b8e0a3b 100644 --- a/docs/docs/integrations/document_loaders/xml.ipynb +++ b/docs/docs/integrations/document_loaders/xml.ipynb @@ -2,18 +2,88 @@ "cells": [ { "cell_type": "markdown", - "id": "22a849cc", + "id": "72ccbe2b", "metadata": {}, "source": [ - "# XML\n", + "# UnstructuredXMLLoader\n", "\n", - "The `UnstructuredXMLLoader` is used to load `XML` files. The loader works with `.xml` files. The page content will be the text extracted from the XML tags." + "This notebook provides a quick overview for getting started with UnstructuredXMLLoader [document loader](https://python.langchain.com/v0.2/docs/concepts/#document-loaders). The `UnstructuredXMLLoader` is used to load `XML` files. The loader works with `.xml` files. The page content will be the text extracted from the XML tags.\n", + "\n", + "\n", + "## Overview\n", + "### Integration details\n", + "\n", + "\n", + "| Class | Package | Local | Serializable | [JS support](https://js.langchain.com/v0.2/docs/integrations/document_loaders/file_loaders/unstructured/)|\n", + "| :--- | :--- | :---: | :---: | :---: |\n", + "| [UnstructuredXMLLoader](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.xml.UnstructuredXMLLoader.html) | [langchain_community](https://api.python.langchain.com/en/latest/community_api_reference.html) | ✅ | ❌ | ✅ | \n", + "### Loader features\n", + "| Source | Document Lazy Loading | Native Async Support\n", + "| :---: | :---: | :---: | \n", + "| UnstructuredXMLLoader | ✅ | ❌ | \n", + "\n", + "## Setup\n", + "\n", + "To access UnstructuredXMLLoader document loader you'll need to install the `langchain-community` integration package.\n", + "\n", + "### Credentials\n", + "\n", + "No credentials are needed to use the UnstructuredXMLLoader" + ] + }, + { + "cell_type": "markdown", + "id": "fc4ba987", + "metadata": {}, + "source": [ + "If you want to get automated best in-class tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:" ] }, { "cell_type": "code", - "execution_count": 1, - "id": "e6616e3a", + "execution_count": null, + "id": "9fa4d5e5", + "metadata": {}, + "outputs": [], + "source": [ + "# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n", + "# os.environ[\"LANGSMITH_TRACING\"] = \"true\"" + ] + }, + { + "cell_type": "markdown", + "id": "38e53f22", + "metadata": {}, + "source": [ + "### Installation\n", + "\n", + "Install **langchain_community**." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fcd320ec", + "metadata": {}, + "outputs": [], + "source": [ + "%pip install -qU langchain_community" + ] + }, + { + "cell_type": "markdown", + "id": "a102f199", + "metadata": {}, + "source": [ + "## Initialization\n", + "\n", + "Now we can instantiate our model object and load documents:" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "2d198582", "metadata": {}, "outputs": [], "source": [ @@ -21,18 +91,91 @@ "\n", "loader = UnstructuredXMLLoader(\n", " \"./example_data/factbook.xml\",\n", - ")\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "9bbb463c", + "metadata": {}, + "source": [ + "## Load" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "cd875e75", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Document(metadata={'source': './example_data/factbook.xml'}, page_content='United States\\n\\nWashington, DC\\n\\nJoe Biden\\n\\nBaseball\\n\\nCanada\\n\\nOttawa\\n\\nJustin Trudeau\\n\\nHockey\\n\\nFrance\\n\\nParis\\n\\nEmmanuel Macron\\n\\nSoccer\\n\\nTrinidad & Tobado\\n\\nPort of Spain\\n\\nKeith Rowley\\n\\nTrack & Field')" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ "docs = loader.load()\n", "docs[0]" ] }, { "cell_type": "code", - "execution_count": null, - "id": "a54342bb", + "execution_count": 4, + "id": "79b52cc0", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{'source': './example_data/factbook.xml'}\n" + ] + } + ], + "source": [ + "print(docs[0].metadata)" + ] + }, + { + "cell_type": "markdown", + "id": "557608e5", + "metadata": {}, + "source": [ + "## Lazy Load" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "e3b9e75c", "metadata": {}, "outputs": [], - "source": [] + "source": [ + "page = []\n", + "for doc in loader.lazy_load():\n", + " page.append(doc)\n", + " if len(page) >= 10:\n", + " # do some paged operation, e.g.\n", + " # index.upsert(page)\n", + "\n", + " page = []" + ] + }, + { + "cell_type": "markdown", + "id": "712aa98f", + "metadata": {}, + "source": [ + "## API reference\n", + "\n", + "For detailed documentation of all __ModuleName__Loader features and configurations head to the API reference: https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.xml.UnstructuredXMLLoader.html" + ] } ], "metadata": { @@ -51,7 +194,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.15" + "version": "3.11.9" } }, "nbformat": 4, diff --git a/docs/docs/integrations/providers/notion.mdx b/docs/docs/integrations/providers/notion.mdx index 7f513686800..6ed4fd306fc 100644 --- a/docs/docs/integrations/providers/notion.mdx +++ b/docs/docs/integrations/providers/notion.mdx @@ -12,16 +12,9 @@ All instructions are in examples below. We have two different loaders: `NotionDirectoryLoader` and `NotionDBLoader`. -See a [usage example for the NotionDirectoryLoader](/docs/integrations/document_loaders/notion). +See [usage examples here](/docs/integrations/document_loaders/notion). ```python -from langchain_community.document_loaders import NotionDirectoryLoader -``` - -See a [usage example for the NotionDBLoader](/docs/integrations/document_loaders/notiondb). - - -```python -from langchain_community.document_loaders import NotionDBLoader +from langchain_community.document_loaders import NotionDirectoryLoader, NotionDBLoader ``` diff --git a/docs/src/theme/FeatureTables.js b/docs/src/theme/FeatureTables.js index f8321370812..1d44350eb0e 100644 --- a/docs/src/theme/FeatureTables.js +++ b/docs/src/theme/FeatureTables.js @@ -440,6 +440,266 @@ const FEATURE_TABLES = { columns: [], items: [], }, + cloud_provider_loaders: { + link: 'docs/integrations/loaders', + columns: [ + {title: "Document Loader", formatter: (item) => {item.name}}, + {title: "Description", formatter: (item) => item.source}, + {title: "Partner Package", formatter: (item) => item.partnerPackage ? "✅" : "❌"}, + {title: "API reference", formatter: (item) => {item.loaderName}}, + ], + items: [ + { + name: "AWS S3 Directory", + link: "aws_s3_directory", + source: "Load documents from an AWS S3 directory", + partnerPackage: false, + loaderName: "S3DirectoryLoader", + apiLink: "https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.s3_directory.S3DirectoryLoader.html" + }, + { + name: "AWS S3 File", + link: "aws_s3_file", + source: "Load documents from an AWS S3 file", + partnerPackage: false, + loaderName: "S3FileLoader", + apiLink: "https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.s3_file.S3FileLoader.html" + }, + { + name: "Azure AI Data", + link: "azure_ai_data", + source: "Load documents from Azure AI services", + partnerPackage: false, + loaderName: "AzureAIDataLoader", + apiLink: "https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.azure_ai_data.AzureAIDataLoader.html" + }, + { + name: "Azure Blob Storage Container", + link: "azure_blob_storage_container", + source: "Load documents from an Azure Blob Storage container", + partnerPackage: false, + loaderName: "AzureBlobStorageContainerLoader", + apiLink: "https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.azure_blob_storage_container.AzureBlobStorageContainerLoader.html" + }, + { + name: "Azure Blob Storage File", + link: "azure_blob_storage_file", + source: "Load documents from an Azure Blob Storage file", + partnerPackage: false, + loaderName: "AzureBlobStorageFileLoader", + apiLink: "https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.azure_blob_storage_file.AzureBlobStorageFileLoader.html" + }, + { + name: "Dropbox", + link: "dropbox", + source: "Load documents from Dropbox", + partnerPackage: false, + loaderName: "DropboxLoader", + apiLink: "https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.dropbox.DropboxLoader.html" + }, + { + name: "Google Cloud Storage Directory", + link: "google_cloud_storage_directory", + source: "Load documents from GCS bucket", + partnerPackage: true, + loaderName: "GCSDirectoryLoader", + apiLink: "https://api.python.langchain.com/en/latest/gcs_directory/langchain_google_community.gcs_directory.GCSDirectoryLoader.html" + }, + { + name: "Google Cloud Storage File", + link: "google_cloud_storage_file", + source: "Load documents from GCS file object", + partnerPackage: true, + loaderName: "GCSFileLoader", + apiLink: "https://api.python.langchain.com/en/latest/gcs_file/langchain_google_community.gcs_file.GCSFileLoader.html" + }, + { + name: "Google Drive", + link: "google_drive", + source: "Load documents from Google Drive (Google Docs only)", + partnerPackage: true, + loaderName: "GoogleDriveLoader", + apiLink: "https://api.python.langchain.com/en/latest/drive/langchain_google_community.drive.GoogleDriveLoader.html" + }, + { + name: "Huawei OBS Directory", + link: "huawei_obs_directory", + source: "Load documents from Huawei Object Storage Service Directory", + partnerPackage: false, + loaderName: "OBSDirectoryLoader", + apiLink: "https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.obs_directory.OBSDirectoryLoader.html" + }, + { + name: "Huawei OBS File", + link: "huawei_obs_file", + source: "Load documents from Huawei Object Storage Service File", + partnerPackage: false, + loaderName: "OBSFileLoader", + apiLink: "https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.obs_file.OBSFileLoader.html" + }, + { + name: "Microsoft OneDrive", + link: "microsoft_onedrive", + source: "Load documents from Microsoft OneDrive", + partnerPackage: false, + loaderName: "OneDriveLoader", + apiLink: "https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.onedrive.OneDriveLoader.html" + }, + { + name: "Microsoft SharePoint", + link: "microsoft_sharepoint", + source: "Load documents from Microsoft SharePoint", + partnerPackage: false, + loaderName: "SharePointLoader", + apiLink: "https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.sharepoint.SharePointLoader.html" + + }, + { + name: "Tencent COS Directory", + link: "tencent_cos_directory", + source: "Load documents from Tencent Cloud Object Storage Directory", + partnerPackage: false, + loaderName: "TencentCOSDirectoryLoader", + apiLink: "https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.tencent_cos_directory.TencentCOSDirectoryLoader.html" + }, + { + name: "Tencent COS File", + link: "tencent_cos_file", + source: "Load documents from Tencent Cloud Object Storage File", + partnerPackage: false, + loaderName: "TencentCOSFileLoader", + apiLink: "https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.tencent_cos_file.TencentCOSFileLoader.html" + }, + ] + }, + messaging_loaders: { + link: 'docs/integrations/loaders', + columns: [ + {title: "Document Loader", formatter: (item) => {item.name}}, + {title: "API reference", formatter: (item) => {item.loaderName}}, + ], + items: [ + { + name: "Telegram", + link: "telegram", + loaderName: "TelegramChatFileLoader", + apiLink: "https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.telegram.TelegramChatFileLoader.html" + }, + { + name: "WhatsApp", + link: "whatsapp_chat", + loaderName: "WhatsAppChatLoader", + apiLink: "https://api.python.langchain.com/en/latest/chat_loaders/langchain_community.chat_loaders.whatsapp.WhatsAppChatLoader.html" + }, + { + name: "Discord", + link: "discord", + loaderName: "DiscordChatLoader", + apiLink: "https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.discord.DiscordChatLoader.html" + }, + { + name: "Facebook Chat", + link: "facebook_chat", + loaderName: "FacebookChatLoader", + apiLink: "https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.facebook_chat.FacebookChatLoader.html" + }, + { + name: "Mastodon", + link: "mastodon", + loaderName: "MastodonTootsLoader", + apiLink: "https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.mastodon.MastodonTootsLoader.html" + } + ] + }, + productivity_loaders: { + link: 'docs/integrations/loaders', + columns: [ + {title: "Document Loader", formatter: (item) => {item.name}}, + {title: "API reference", formatter: (item) => {item.loaderName}}, + ], + items: [ + { + name: "Figma", + link: "figma", + loaderName: "FigmaFileLoader", + apiLink: "https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.figma.FigmaFileLoader.html" + }, + { + name: "Notion", + link: "notion", + loaderName: "NotionDirectoryLoader", + apiLink: "https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.notion.NotionDirectoryLoader.html" + }, + { + name: "Slack", + link: "slack", + loaderName: "SlackDirectoryLoader", + apiLink: "https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.slack_directory.SlackDirectoryLoader.html" + }, + { + name: "Quip", + link: "quip", + loaderName: "QuipLoader", + apiLink: "https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.quip.QuipLoader.html" + }, + { + name: "Trello", + link: "trello", + loaderName: "TrelloLoader", + apiLink: "https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.trello.TrelloLoader.html" + }, + { + name: "Roam", + link: "roam", + loaderName: "RoamLoader", + apiLink: "https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.roam.RoamLoader.html" + }, + { + name: "GitHub", + link: "github", + loaderName: "GithubFileLoader", + apiLink: "https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.github.GithubFileLoader.html" + } + ] + }, + social_loaders: { + link: 'docs/integrations/loaders', + columns: [ + {title: "Document Loader", formatter: (item) => {item.name}}, + {title: "API reference", formatter: (item) => {item.loaderName}}, + ], + items: [ + { + name: "Twitter", + link: "twitter", + loaderName: "TwitterTweetLoader", + apiLink: "https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.twitter.TwitterTweetLoader.html" + + }, + { + name: "Reddit", + link: "RedditPostsLoader", + loaderName: "RedditPostsLoader", + apiLink: "https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.reddit.RedditPostsLoader.html" + }, + ] + }, webpage_loaders: { link: 'docs/integrations/loaders', columns: [ @@ -606,6 +866,12 @@ const FEATURE_TABLES = { link: "bshtml", source: "HTML files", apiLink: "https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.html_bs.BSHTMLLoader.html" + }, + { + name: "UnstrucutredXMLLoader", + link: "xml", + source: "XML files", + apiLink: "https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.xml.UnstructuredXMLLoader.html" } ] }, diff --git a/docs/vercel.json b/docs/vercel.json index 6afbee9e5e1..5c91cb3fb3b 100644 --- a/docs/vercel.json +++ b/docs/vercel.json @@ -102,6 +102,10 @@ "source": "/v0.2/docs/integrations/toolkits/xorbits/", "destination": "/v0.2/docs/integrations/tools#search" }, + { + "source": "/v0.2/docs/integrations/document_loaders/notiondb/", + "destination": "/v0.2/docs/integrations/document_loaders/notion/" + }, { "source": "/v0.2/docs/integrations/chat/ollama_functions/", "destination": "https://python.langchain.com/v0.1/docs/integrations/chat/ollama_functions/"