mirror of
https://github.com/hwchase17/langchain.git
synced 2025-06-24 07:35:18 +00:00
docs: Clean up Diffbot docs (#21781)
The Diffbot DocumentLoader page doesn't actually run for a number of reasons. This PR fixes it along with some light details on the Graph Transformer and Provider pages. ## Full Changelog [Document Loader Page](https://python.langchain.com/v0.1/docs/integrations/document_loaders/diffbot/) * Fixed the notebook so that it actually runs (missing required modules, env variables, etc..) * Added "open in colab" button like the Graph Transformer page [Graph Transformer Page](https://python.langchain.com/v0.2/docs/integrations/graphs/diffbot/) * Fixed broken colab link * Moved "open in colab" button to below description so the description in the [Graphs category page](https://python.langchain.com/v0.2/docs/integrations/graphs/) shows up correctly [Provider Page](https://python.langchain.com/v0.2/docs/integrations/providers/diffbot/) * Clarified explanations of Diffbot products * Added section and link to LangChain Graph Transformer page --------- Co-authored-by: jeromechoo <hello@jeromechoo.com> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
This commit is contained in:
parent
d8a101074f
commit
2316635add
File diff suppressed because one or more lines are too long
@ -7,24 +7,21 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"# Diffbot\n",
|
"# Diffbot\n",
|
||||||
"\n",
|
"\n",
|
||||||
"[](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/docs/use_cases/graph/diffbot_graphtransformer.ipynb)\n",
|
">[Diffbot](https://docs.diffbot.com/docs/getting-started-with-diffbot) is a suite of ML-based products that make it easy to structure web data.\n",
|
||||||
"\n",
|
|
||||||
">[Diffbot](https://docs.diffbot.com/docs/getting-started-with-diffbot) is a suite of products that make it easy to integrate and research data on the web.\n",
|
|
||||||
">\n",
|
">\n",
|
||||||
">[The Diffbot Knowledge Graph](https://docs.diffbot.com/docs/getting-started-with-diffbot-knowledge-graph) is a self-updating graph database of the public web.\n",
|
">Diffbot's [Natural Language Processing API](https://www.diffbot.com/products/natural-language/) allows for the extraction of entities, relationships, and semantic meaning from unstructured text data.",
|
||||||
"\n",
|
"\n",
|
||||||
|
"[](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/docs/integrations/graphs/diffbot.ipynb)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"## Use case\n",
|
"## Use case\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Text data often contain rich relationships and insights used for various analytics, recommendation engines, or knowledge management applications.\n",
|
"Text data often contain rich relationships and insights used for various analytics, recommendation engines, or knowledge management applications.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"`Diffbot's NLP API` allows for the extraction of entities, relationships, and semantic meaning from unstructured text data.\n",
|
|
||||||
"\n",
|
|
||||||
"By coupling `Diffbot's NLP API` with `Neo4j`, a graph database, you can create powerful, dynamic graph structures based on the information extracted from text. These graph structures are fully queryable and can be integrated into various applications.\n",
|
"By coupling `Diffbot's NLP API` with `Neo4j`, a graph database, you can create powerful, dynamic graph structures based on the information extracted from text. These graph structures are fully queryable and can be integrated into various applications.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"This combination allows for use cases such as:\n",
|
"This combination allows for use cases such as:\n",
|
||||||
"\n",
|
"\n",
|
||||||
"* Building knowledge graphs from textual documents, websites, or social media feeds.\n",
|
"* Building knowledge graphs (like [Diffbot's Knowledge Graph](https://www.diffbot.com/products/knowledge-graph/)) from textual documents, websites, or social media feeds.\n",
|
||||||
"* Generating recommendations based on semantic relationships in the data.\n",
|
"* Generating recommendations based on semantic relationships in the data.\n",
|
||||||
"* Creating advanced search features that understand the relationships between entities.\n",
|
"* Creating advanced search features that understand the relationships between entities.\n",
|
||||||
"* Building analytics dashboards that allow users to explore the hidden relationships in data.\n",
|
"* Building analytics dashboards that allow users to explore the hidden relationships in data.\n",
|
||||||
@ -57,11 +54,11 @@
|
|||||||
"id": "77718977-629e-46c2-b091-f9191b9ec569",
|
"id": "77718977-629e-46c2-b091-f9191b9ec569",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Diffbot NLP Service\n",
|
"### Diffbot NLP API\n",
|
||||||
"\n",
|
"\n",
|
||||||
"`Diffbot's NLP` service is a tool for extracting entities, relationships, and semantic context from unstructured text data.\n",
|
"`Diffbot's NLP API` is a tool for extracting entities, relationships, and semantic context from unstructured text data.\n",
|
||||||
"This extracted information can be used to construct a knowledge graph.\n",
|
"This extracted information can be used to construct a knowledge graph.\n",
|
||||||
"To use their service, you'll need to obtain an API key from [Diffbot](https://www.diffbot.com/products/natural-language/)."
|
"To use the API, you'll need to obtain a [free API token from Diffbot](https://app.diffbot.com/get-started/)."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@ -73,8 +70,8 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"from langchain_experimental.graph_transformers.diffbot import DiffbotGraphTransformer\n",
|
"from langchain_experimental.graph_transformers.diffbot import DiffbotGraphTransformer\n",
|
||||||
"\n",
|
"\n",
|
||||||
"diffbot_api_key = \"DIFFBOT_API_KEY\"\n",
|
"diffbot_api_token = \"DIFFBOT_API_TOKEN\"\n",
|
||||||
"diffbot_nlp = DiffbotGraphTransformer(diffbot_api_key=diffbot_api_key)"
|
"diffbot_nlp = DiffbotGraphTransformer(diffbot_api_token=diffbot_api_token)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
@ -1,18 +1,29 @@
|
|||||||
# Diffbot
|
# Diffbot
|
||||||
|
|
||||||
>[Diffbot](https://docs.diffbot.com/docs) is a service to read web pages. Unlike traditional web scraping tools,
|
> [Diffbot](https://docs.diffbot.com/docs) is a suite of ML-based products that make it easy to structure and integrate web data.
|
||||||
> `Diffbot` doesn't require any rules to read the content on a page.
|
|
||||||
>It starts with computer vision, which classifies a page into one of 20 possible types. Content is then interpreted by a machine learning model trained to identify the key attributes on a page based on its type.
|
|
||||||
>The result is a website transformed into clean-structured data (like JSON or CSV), ready for your application.
|
|
||||||
|
|
||||||
## Installation and Setup
|
## Installation and Setup
|
||||||
|
|
||||||
Read [instructions](https://docs.diffbot.com/reference/authentication) how to get the Diffbot API Token.
|
[Get a free Diffbot API token](https://app.diffbot.com/get-started/) and [follow these instructions](https://docs.diffbot.com/reference/authentication) to authenticate your requests.
|
||||||
|
|
||||||
## Document Loader
|
## Document Loader
|
||||||
|
|
||||||
|
Diffbot's [Extract API](https://docs.diffbot.com/reference/extract-introduction) is a service that structures and normalizes data from web pages.
|
||||||
|
|
||||||
|
Unlike traditional web scraping tools, `Diffbot Extract` doesn't require any rules to read the content on a page. It uses a computer vision model to classify a page into one of 20 possible types, and then transforms raw HTML markup into JSON. The resulting structured JSON follows a consistent [type-based ontology](https://docs.diffbot.com/docs/ontology), which makes it easy to extract data from multiple different web sources with the same schema.
|
||||||
|
|
||||||
See a [usage example](/docs/integrations/document_loaders/diffbot).
|
See a [usage example](/docs/integrations/document_loaders/diffbot).
|
||||||
|
|
||||||
```python
|
```python
|
||||||
from langchain_community.document_loaders import DiffbotLoader
|
from langchain_community.document_loaders import DiffbotLoader
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Graphs
|
||||||
|
|
||||||
|
Diffbot's [Natural Language Processing API](https://www.diffbot.com/products/natural-language/) allows for the extraction of entities, relationships, and semantic meaning from unstructured text data.
|
||||||
|
|
||||||
|
See a [usage example](/docs/integrations/graphs/diffbot).
|
||||||
|
|
||||||
|
```python
|
||||||
|
from langchain_experimental.graph_transformers.diffbot import DiffbotGraphTransformer
|
||||||
|
```
|
||||||
|
Loading…
Reference in New Issue
Block a user