docs: ecosystem/integrations update 2 (#5282)

# docs: ecosystem/integrations update 2

#5219 - part 1 
The second part of this update (parts are independent of each other! no
overlap):

- added diffbot.md
- updated confluence.ipynb; added confluence.md
- updated college_confidential.md
- updated openai.md
- added blackboard.md
- added bilibili.md
- added azure_blob_storage.md
- added azlyrics.md
- added aws_s3.md

## Who can review?

@hwchase17@agola11
@agola11
 @vowelparrot
 @dev2049
This commit is contained in:
Leonid Ganeline
2023-05-29 07:19:43 -07:00
committed by GitHub
parent ccb6238de1
commit a3598193a0
11 changed files with 212 additions and 22 deletions

View File

@@ -11,7 +11,7 @@
">It starts with computer vision, which classifies a page into one of 20 possible types. Content is then interpreted by a machine learning model trained to identify the key attributes on a page based on its type.\n",
">The result is a website transformed into clean structured data (like JSON or CSV), ready for your application.\n",
"\n",
"This covers how to extract HTML documents from a list of URLs using the [Diffbot extract API](https://www.diffbot.com/products/extract/), into a document format that we can use downstream."
"This covers how to extract HTML documents from a list of URLs using the [Diffbot extract API](https://www.diffbot.com/products/extract/), into a document format that we can use downstream.\n"
]
},
{
@@ -31,7 +31,9 @@
"id": "6fffec88",
"metadata": {},
"source": [
"The Diffbot Extract API Requires an API token. Once you have it, you can extract the data from the previous URLs\n"
"The Diffbot Extract API Requires an API token. Once you have it, you can extract the data.\n",
"\n",
"Read [instructions](https://docs.diffbot.com/reference/authentication) how to get the Diffbot API Token."
]
},
{