docs: ecosystem/integrations update 2 (#5282)

# docs: ecosystem/integrations update 2

#5219 - part 1 
The second part of this update (parts are independent of each other! no
overlap):

- added diffbot.md
- updated confluence.ipynb; added confluence.md
- updated college_confidential.md
- updated openai.md
- added blackboard.md
- added bilibili.md
- added azure_blob_storage.md
- added azlyrics.md
- added aws_s3.md

## Who can review?

@hwchase17@agola11
@agola11
 @vowelparrot
 @dev2049
This commit is contained in:
Leonid Ganeline
2023-05-29 07:19:43 -07:00
committed by GitHub
parent ccb6238de1
commit a3598193a0
11 changed files with 212 additions and 22 deletions

View File

@@ -8,13 +8,11 @@
"\n",
">[Confluence](https://www.atlassian.com/software/confluence) is a wiki collaboration platform that saves and organizes all of the project-related material. `Confluence` is a knowledge base that primarily handles content management activities. \n",
"\n",
"A loader for `Confluence` pages.\n",
"A loader for `Confluence` pages currently supports both `username/api_key` and `Oauth2 login`.\n",
"See [instructions](https://support.atlassian.com/atlassian-account/docs/manage-api-tokens-for-your-atlassian-account/).\n",
"\n",
"\n",
"This currently supports both `username/api_key` and `Oauth2 login`.\n",
"\n",
"\n",
"Specify a list page_ids and/or space_key to load in the corresponding pages into Document objects, if both are specified the union of both sets will be returned.\n",
"Specify a list `page_id`-s and/or `space_key` to load in the corresponding pages into Document objects, if both are specified the union of both sets will be returned.\n",
"\n",
"\n",
"You can also specify a boolean `include_attachments` to include attachments, this is set to False by default, if set to True all attachments will be downloaded and ConfluenceReader will extract the text from the attachments and add it to the Document object. Currently supported attachment types are: `PDF`, `PNG`, `JPEG/JPG`, `SVG`, `Word` and `Excel`.\n",

View File

@@ -11,7 +11,7 @@
">It starts with computer vision, which classifies a page into one of 20 possible types. Content is then interpreted by a machine learning model trained to identify the key attributes on a page based on its type.\n",
">The result is a website transformed into clean structured data (like JSON or CSV), ready for your application.\n",
"\n",
"This covers how to extract HTML documents from a list of URLs using the [Diffbot extract API](https://www.diffbot.com/products/extract/), into a document format that we can use downstream."
"This covers how to extract HTML documents from a list of URLs using the [Diffbot extract API](https://www.diffbot.com/products/extract/), into a document format that we can use downstream.\n"
]
},
{
@@ -31,7 +31,9 @@
"id": "6fffec88",
"metadata": {},
"source": [
"The Diffbot Extract API Requires an API token. Once you have it, you can extract the data from the previous URLs\n"
"The Diffbot Extract API Requires an API token. Once you have it, you can extract the data.\n",
"\n",
"Read [instructions](https://docs.diffbot.com/reference/authentication) how to get the Diffbot API Token."
]
},
{