docs: Clean up Diffbot docs (#21781)

The Diffbot DocumentLoader page doesn't actually run for a number of reasons. This PR fixes it along with some light details on the Graph Transformer and Provider pages. ## Full Changelog [Document Loader Page](https://python.langchain.com/v0.1/docs/integrations/document_loaders/diffbot/) * Fixed the notebook so that it actually runs (missing required modules, env variables, etc..) * Added "open in colab" button like the Graph Transformer page [Graph Transformer Page](https://python.langchain.com/v0.2/docs/integrations/graphs/diffbot/) * Fixed broken colab link * Moved "open in colab" button to below description so the description in the [Graphs category page](https://python.langchain.com/v0.2/docs/integrations/graphs/) shows up correctly [Provider Page](https://python.langchain.com/v0.2/docs/integrations/providers/diffbot/) * Clarified explanations of Diffbot products * Added section and link to LangChain Graph Transformer page --------- Co-authored-by: jeromechoo <hello@jeromechoo.com> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2025-06-23 07:09:31 +00:00 · 2024-05-20 18:09:22 -05:00 · 2024-05-20 18:09:22 -05:00 · 2316635add
commit 2316635add
parent d8a101074f
3 changed files with 167 additions and 40 deletions
--- a/docs/docs/integrations/document_loaders/diffbot.ipynb
+++ b/docs/docs/integrations/document_loaders/diffbot.ipynb
--- a/docs/docs/integrations/graphs/diffbot.ipynb
+++ b/docs/docs/integrations/graphs/diffbot.ipynb
@ -7,24 +7,21 @@
   "source": [
    "# Diffbot\n",
    "\n",
-    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/docs/use_cases/graph/diffbot_graphtransformer.ipynb)\n",
-    "\n",
-    ">[Diffbot](https://docs.diffbot.com/docs/getting-started-with-diffbot) is a suite of products that make it easy to integrate and research data on the web.\n",
+    ">[Diffbot](https://docs.diffbot.com/docs/getting-started-with-diffbot) is a suite of ML-based products that make it easy to structure web data.\n",
    ">\n",
-    ">[The Diffbot Knowledge Graph](https://docs.diffbot.com/docs/getting-started-with-diffbot-knowledge-graph) is a self-updating graph database of the public web.\n",
+    ">Diffbot's [Natural Language Processing API](https://www.diffbot.com/products/natural-language/) allows for the extraction of entities, relationships, and semantic meaning from unstructured text data.",
    "\n",
+    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/docs/integrations/graphs/diffbot.ipynb)\n",
    "\n",
    "## Use case\n",
    "\n",
    "Text data often contain rich relationships and insights used for various analytics, recommendation engines, or knowledge management applications.\n",
    "\n",
-    "`Diffbot's NLP API` allows for the extraction of entities, relationships, and semantic meaning from unstructured text data.\n",
-    "\n",
    "By coupling `Diffbot's NLP API` with `Neo4j`, a graph database, you can create powerful, dynamic graph structures based on the information extracted from text. These graph structures are fully queryable and can be integrated into various applications.\n",
    "\n",
    "This combination allows for use cases such as:\n",
    "\n",
-    "* Building knowledge graphs from textual documents, websites, or social media feeds.\n",
+    "* Building knowledge graphs (like [Diffbot's Knowledge Graph](https://www.diffbot.com/products/knowledge-graph/)) from textual documents, websites, or social media feeds.\n",
    "* Generating recommendations based on semantic relationships in the data.\n",
    "* Creating advanced search features that understand the relationships between entities.\n",
    "* Building analytics dashboards that allow users to explore the hidden relationships in data.\n",
@ -57,11 +54,11 @@
   "id": "77718977-629e-46c2-b091-f9191b9ec569",
   "metadata": {},
   "source": [
-    "### Diffbot NLP Service\n",
+    "### Diffbot NLP API\n",
    "\n",
-    "`Diffbot's NLP` service is a tool for extracting entities, relationships, and semantic context from unstructured text data.\n",
+    "`Diffbot's NLP API` is a tool for extracting entities, relationships, and semantic context from unstructured text data.\n",
    "This extracted information can be used to construct a knowledge graph.\n",
-    "To use their service, you'll need to obtain an API key from [Diffbot](https://www.diffbot.com/products/natural-language/)."
+    "To use the API, you'll need to obtain a [free API token from Diffbot](https://app.diffbot.com/get-started/)."
   ]
  },
  {
@ -73,8 +70,8 @@
   "source": [
    "from langchain_experimental.graph_transformers.diffbot import DiffbotGraphTransformer\n",
    "\n",
-    "diffbot_api_key = \"DIFFBOT_API_KEY\"\n",
-    "diffbot_nlp = DiffbotGraphTransformer(diffbot_api_key=diffbot_api_key)"
+    "diffbot_api_token = \"DIFFBOT_API_TOKEN\"\n",
+    "diffbot_nlp = DiffbotGraphTransformer(diffbot_api_token=diffbot_api_token)"
   ]
  },
  {
--- a/docs/docs/integrations/providers/diffbot.mdx
+++ b/docs/docs/integrations/providers/diffbot.mdx
@ -1,18 +1,29 @@
 # Diffbot

->[Diffbot](https://docs.diffbot.com/docs) is a service to read web pages. Unlike traditional web scraping tools, 
-> `Diffbot` doesn't require any rules to read the content on a page.
->It starts with computer vision, which classifies a page into one of 20 possible types. Content is then interpreted by a machine learning model trained to identify the key attributes on a page based on its type.
->The result is a website transformed into clean-structured data (like JSON or CSV), ready for your application.
+> [Diffbot](https://docs.diffbot.com/docs) is a suite of ML-based products that make it easy to structure and integrate web data.

 ## Installation and Setup

-Read [instructions](https://docs.diffbot.com/reference/authentication) how to get the Diffbot API Token.
+[Get a free Diffbot API token](https://app.diffbot.com/get-started/) and [follow these instructions](https://docs.diffbot.com/reference/authentication) to authenticate your requests.

 ## Document Loader

+Diffbot's [Extract API](https://docs.diffbot.com/reference/extract-introduction) is a service that structures and normalizes data from web pages. 
+
+Unlike traditional web scraping tools, `Diffbot Extract` doesn't require any rules to read the content on a page. It uses a computer vision model to classify a page into one of 20 possible types, and then transforms raw HTML markup into JSON. The resulting structured JSON follows a consistent [type-based ontology](https://docs.diffbot.com/docs/ontology), which makes it easy to extract data from multiple different web sources with the same schema. 
+
 See a [usage example](/docs/integrations/document_loaders/diffbot).

 ```python
 from langchain_community.document_loaders import DiffbotLoader
 ```
+
+## Graphs
+
+Diffbot's [Natural Language Processing API](https://www.diffbot.com/products/natural-language/) allows for the extraction of entities, relationships, and semantic meaning from unstructured text data.
+
+See a [usage example](/docs/integrations/graphs/diffbot).
+
+```python
+from langchain_experimental.graph_transformers.diffbot import DiffbotGraphTransformer
+```