mirror of
https://github.com/hwchase17/langchain.git
synced 2025-06-23 07:09:31 +00:00
docs: Clean up Diffbot docs (#21781)
The Diffbot DocumentLoader page doesn't actually run for a number of reasons. This PR fixes it along with some light details on the Graph Transformer and Provider pages. ## Full Changelog [Document Loader Page](https://python.langchain.com/v0.1/docs/integrations/document_loaders/diffbot/) * Fixed the notebook so that it actually runs (missing required modules, env variables, etc..) * Added "open in colab" button like the Graph Transformer page [Graph Transformer Page](https://python.langchain.com/v0.2/docs/integrations/graphs/diffbot/) * Fixed broken colab link * Moved "open in colab" button to below description so the description in the [Graphs category page](https://python.langchain.com/v0.2/docs/integrations/graphs/) shows up correctly [Provider Page](https://python.langchain.com/v0.2/docs/integrations/providers/diffbot/) * Clarified explanations of Diffbot products * Added section and link to LangChain Graph Transformer page --------- Co-authored-by: jeromechoo <hello@jeromechoo.com> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
This commit is contained in:
parent
d8a101074f
commit
2316635add
File diff suppressed because one or more lines are too long
@ -7,24 +7,21 @@
|
||||
"source": [
|
||||
"# Diffbot\n",
|
||||
"\n",
|
||||
"[](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/docs/use_cases/graph/diffbot_graphtransformer.ipynb)\n",
|
||||
"\n",
|
||||
">[Diffbot](https://docs.diffbot.com/docs/getting-started-with-diffbot) is a suite of products that make it easy to integrate and research data on the web.\n",
|
||||
">[Diffbot](https://docs.diffbot.com/docs/getting-started-with-diffbot) is a suite of ML-based products that make it easy to structure web data.\n",
|
||||
">\n",
|
||||
">[The Diffbot Knowledge Graph](https://docs.diffbot.com/docs/getting-started-with-diffbot-knowledge-graph) is a self-updating graph database of the public web.\n",
|
||||
">Diffbot's [Natural Language Processing API](https://www.diffbot.com/products/natural-language/) allows for the extraction of entities, relationships, and semantic meaning from unstructured text data.",
|
||||
"\n",
|
||||
"[](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/docs/integrations/graphs/diffbot.ipynb)\n",
|
||||
"\n",
|
||||
"## Use case\n",
|
||||
"\n",
|
||||
"Text data often contain rich relationships and insights used for various analytics, recommendation engines, or knowledge management applications.\n",
|
||||
"\n",
|
||||
"`Diffbot's NLP API` allows for the extraction of entities, relationships, and semantic meaning from unstructured text data.\n",
|
||||
"\n",
|
||||
"By coupling `Diffbot's NLP API` with `Neo4j`, a graph database, you can create powerful, dynamic graph structures based on the information extracted from text. These graph structures are fully queryable and can be integrated into various applications.\n",
|
||||
"\n",
|
||||
"This combination allows for use cases such as:\n",
|
||||
"\n",
|
||||
"* Building knowledge graphs from textual documents, websites, or social media feeds.\n",
|
||||
"* Building knowledge graphs (like [Diffbot's Knowledge Graph](https://www.diffbot.com/products/knowledge-graph/)) from textual documents, websites, or social media feeds.\n",
|
||||
"* Generating recommendations based on semantic relationships in the data.\n",
|
||||
"* Creating advanced search features that understand the relationships between entities.\n",
|
||||
"* Building analytics dashboards that allow users to explore the hidden relationships in data.\n",
|
||||
@ -57,11 +54,11 @@
|
||||
"id": "77718977-629e-46c2-b091-f9191b9ec569",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Diffbot NLP Service\n",
|
||||
"### Diffbot NLP API\n",
|
||||
"\n",
|
||||
"`Diffbot's NLP` service is a tool for extracting entities, relationships, and semantic context from unstructured text data.\n",
|
||||
"`Diffbot's NLP API` is a tool for extracting entities, relationships, and semantic context from unstructured text data.\n",
|
||||
"This extracted information can be used to construct a knowledge graph.\n",
|
||||
"To use their service, you'll need to obtain an API key from [Diffbot](https://www.diffbot.com/products/natural-language/)."
|
||||
"To use the API, you'll need to obtain a [free API token from Diffbot](https://app.diffbot.com/get-started/)."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -73,8 +70,8 @@
|
||||
"source": [
|
||||
"from langchain_experimental.graph_transformers.diffbot import DiffbotGraphTransformer\n",
|
||||
"\n",
|
||||
"diffbot_api_key = \"DIFFBOT_API_KEY\"\n",
|
||||
"diffbot_nlp = DiffbotGraphTransformer(diffbot_api_key=diffbot_api_key)"
|
||||
"diffbot_api_token = \"DIFFBOT_API_TOKEN\"\n",
|
||||
"diffbot_nlp = DiffbotGraphTransformer(diffbot_api_token=diffbot_api_token)"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -1,18 +1,29 @@
|
||||
# Diffbot
|
||||
|
||||
>[Diffbot](https://docs.diffbot.com/docs) is a service to read web pages. Unlike traditional web scraping tools,
|
||||
> `Diffbot` doesn't require any rules to read the content on a page.
|
||||
>It starts with computer vision, which classifies a page into one of 20 possible types. Content is then interpreted by a machine learning model trained to identify the key attributes on a page based on its type.
|
||||
>The result is a website transformed into clean-structured data (like JSON or CSV), ready for your application.
|
||||
> [Diffbot](https://docs.diffbot.com/docs) is a suite of ML-based products that make it easy to structure and integrate web data.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
Read [instructions](https://docs.diffbot.com/reference/authentication) how to get the Diffbot API Token.
|
||||
[Get a free Diffbot API token](https://app.diffbot.com/get-started/) and [follow these instructions](https://docs.diffbot.com/reference/authentication) to authenticate your requests.
|
||||
|
||||
## Document Loader
|
||||
|
||||
Diffbot's [Extract API](https://docs.diffbot.com/reference/extract-introduction) is a service that structures and normalizes data from web pages.
|
||||
|
||||
Unlike traditional web scraping tools, `Diffbot Extract` doesn't require any rules to read the content on a page. It uses a computer vision model to classify a page into one of 20 possible types, and then transforms raw HTML markup into JSON. The resulting structured JSON follows a consistent [type-based ontology](https://docs.diffbot.com/docs/ontology), which makes it easy to extract data from multiple different web sources with the same schema.
|
||||
|
||||
See a [usage example](/docs/integrations/document_loaders/diffbot).
|
||||
|
||||
```python
|
||||
from langchain_community.document_loaders import DiffbotLoader
|
||||
```
|
||||
|
||||
## Graphs
|
||||
|
||||
Diffbot's [Natural Language Processing API](https://www.diffbot.com/products/natural-language/) allows for the extraction of entities, relationships, and semantic meaning from unstructured text data.
|
||||
|
||||
See a [usage example](/docs/integrations/graphs/diffbot).
|
||||
|
||||
```python
|
||||
from langchain_experimental.graph_transformers.diffbot import DiffbotGraphTransformer
|
||||
```
|
||||
|
Loading…
Reference in New Issue
Block a user