mirror of
https://github.com/hwchase17/langchain.git
synced 2025-09-20 18:12:35 +00:00
docs: add ZenRows provider and tool integration docs (#31742)
**Description:** Adds documentation for ZenRows integration with LangChain, including provider overview and detailed tool documentation. ZenRows is an enterprise-grade web scraping solution that enables LangChain agents to extract web content at scale with advanced features like JavaScript rendering, anti-bot bypass, geo-targeting, and multiple output formats. This PR includes: - Provider documentation (`docs/docs/integrations/providers/zenrows.ipynb`) - Tool documentation (`docs/docs/integrations/tools/zenrows_universal_scraper.ipynb`) - Complete usage examples and API reference links **Issue:** N/A **Dependencies:** - [langchain-zenrows](https://github.com/ZenRows-Hub/langchain-zenrows) package (external, available on [PyPI](https://pypi.org/project/langchain-zenrows/)) - No changes to core LangChain dependencies **LinkedIn handle:** https://www.linkedin.com/company/zenrows/ --------- Co-authored-by: Mason Daugherty <mason@langchain.dev>
This commit is contained in:
68
docs/docs/integrations/providers/zenrows.ipynb
Normal file
68
docs/docs/integrations/providers/zenrows.ipynb
Normal file
@@ -0,0 +1,68 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {
|
||||||
|
"id": "_MFfVhVCa15x"
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"# ZenRows\n",
|
||||||
|
"\n",
|
||||||
|
"[ZenRows](https://www.zenrows.com/) is an enterprise-grade web scraping tool that provides advanced web data extraction capabilities at scale. ZenRows specializes in scraping modern websites, bypassing anti-bot systems, extracting structured data from any website, rendering JavaScript-heavy content, accessing geo-restricted websites, and more.\n",
|
||||||
|
"\n",
|
||||||
|
"[langchain-zenrows](https://pypi.org/project/langchain-zenrows/) provides tools that allow LLMs to access web data using ZenRows' powerful scraping infrastructure.\n",
|
||||||
|
"\n",
|
||||||
|
"## Installation and Setup\n",
|
||||||
|
"\n",
|
||||||
|
"```bash\n",
|
||||||
|
"pip install langchain-zenrows\n",
|
||||||
|
"```\n",
|
||||||
|
"\n",
|
||||||
|
"You'll need to set up your ZenRows API key:\n",
|
||||||
|
"\n",
|
||||||
|
"```python\n",
|
||||||
|
"import os\n",
|
||||||
|
"os.environ[\"ZENROWS_API_KEY\"] = \"your-api-key\"\n",
|
||||||
|
"```\n",
|
||||||
|
"\n",
|
||||||
|
"Or you can pass it directly when initializing tools:\n",
|
||||||
|
"\n",
|
||||||
|
"```python\n",
|
||||||
|
"from langchain_zenrows import ZenRowsUniversalScraper\n",
|
||||||
|
"zenrows_scraper_tool = ZenRowsUniversalScraper(zenrows_api_key=\"your-api-key\")\n",
|
||||||
|
"```\n",
|
||||||
|
"\n",
|
||||||
|
"## Tools\n",
|
||||||
|
"\n",
|
||||||
|
"### ZenRowsUniversalScraper\n",
|
||||||
|
"\n",
|
||||||
|
"The ZenRows integration provides comprehensive web scraping features:\n",
|
||||||
|
"\n",
|
||||||
|
"- **JavaScript Rendering**: Scrape modern SPAs and dynamic content\n",
|
||||||
|
"- **Anti-Bot Bypass**: Overcome sophisticated bot detection systems \n",
|
||||||
|
"- **Geo-Targeting**: Access region-specific content with 190+ countries\n",
|
||||||
|
"- **Multiple Output Formats**: HTML, Markdown, Plaintext, PDF, Screenshots\n",
|
||||||
|
"- **CSS Extraction**: Target specific data with CSS selectors\n",
|
||||||
|
"- **Structured Data Extraction**: Automatically extract emails, phone numbers, links, and more\n",
|
||||||
|
"- **Session Management**: Maintain consistent sessions across requests\n",
|
||||||
|
"- **Premium Proxies**: Residential IPs for maximum success rates\n",
|
||||||
|
"\n",
|
||||||
|
"See more in the [ZenRows tool documentation](/docs/integrations/tools/zenrows_universal_scraper)."
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"colab": {
|
||||||
|
"provenance": []
|
||||||
|
},
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3",
|
||||||
|
"name": "python3"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"name": "python"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 0
|
||||||
|
}
|
281
docs/docs/integrations/tools/zenrows_universal_scraper.ipynb
Normal file
281
docs/docs/integrations/tools/zenrows_universal_scraper.ipynb
Normal file
@@ -0,0 +1,281 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {
|
||||||
|
"id": "FVo_qZB6crBs"
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"# ZenRowsUniversalScraper\n",
|
||||||
|
"\n",
|
||||||
|
"[ZenRows](https://www.zenrows.com/) is an enterprise-grade web scraping tool that provides advanced web data extraction capabilities at scale. For more information about ZenRows and its Universal Scraper API, visit the [official documentation](https://docs.zenrows.com/universal-scraper-api/).\n",
|
||||||
|
"\n",
|
||||||
|
"This document provides a quick overview for getting started with ZenRowsUniversalScraper tool. For detailed documentation of all ZenRowsUniversalScraper features and configurations head to the [API reference](https://github.com/ZenRows-Hub/langchain-zenrows?tab=readme-ov-file#api-reference).\n",
|
||||||
|
"\n",
|
||||||
|
"## Overview\n",
|
||||||
|
"\n",
|
||||||
|
"### Integration details\n",
|
||||||
|
"\n",
|
||||||
|
"| Class | Package | JS support | Package latest |\n",
|
||||||
|
"| :--- | :--- | :---: | :---: |\n",
|
||||||
|
"| [ZenRowsUniversalScraper](https://pypi.org/project/langchain-zenrows/) | [langchain-zenrows](https://pypi.org/project/langchain-zenrows/) | ❌ |  |\n",
|
||||||
|
"\n",
|
||||||
|
"### Tool features\n",
|
||||||
|
"\n",
|
||||||
|
"| Feature | Support |\n",
|
||||||
|
"| :--- | :---: |\n",
|
||||||
|
"| **JavaScript Rendering** | ✅ |\n",
|
||||||
|
"| **Anti-Bot Bypass** | ✅ |\n",
|
||||||
|
"| **Geo-Targeting** | ✅ |\n",
|
||||||
|
"| **Multiple Output Formats** | ✅ |\n",
|
||||||
|
"| **CSS Extraction** | ✅ |\n",
|
||||||
|
"| **Screenshot Capture** | ✅ |\n",
|
||||||
|
"| **Session Management** | ✅ |\n",
|
||||||
|
"| **Premium Proxies** | ✅ |\n",
|
||||||
|
"\n",
|
||||||
|
"## Setup\n",
|
||||||
|
"\n",
|
||||||
|
"Install the required dependencies."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {
|
||||||
|
"collapsed": true,
|
||||||
|
"id": "henNSgOlcww5"
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"pip install langchain-zenrows"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {
|
||||||
|
"id": "IS2yw_UaczgP"
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"### Credentials\n",
|
||||||
|
"\n",
|
||||||
|
"You'll need a ZenRows API key to use this tool. You can sign up for free at [ZenRows](https://app.zenrows.com/register?prod=universal_scraper)."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 2,
|
||||||
|
"metadata": {
|
||||||
|
"id": "Z097qruic2iH"
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import os\n",
|
||||||
|
"\n",
|
||||||
|
"# Set your ZenRows API key\n",
|
||||||
|
"os.environ[\"ZENROWS_API_KEY\"] = \"<YOUR_ZENROWS_API_KEY>\""
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {
|
||||||
|
"id": "hB7fHgmQc5eh"
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"## Instantiation\n",
|
||||||
|
"\n",
|
||||||
|
"Here's how to instantiate an instance of the ZenRowsUniversalScraper tool."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 3,
|
||||||
|
"metadata": {
|
||||||
|
"id": "ezdGcI3Hc8H3"
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import os\n",
|
||||||
|
"\n",
|
||||||
|
"from langchain_zenrows import ZenRowsUniversalScraper\n",
|
||||||
|
"\n",
|
||||||
|
"# Set your ZenRows API key\n",
|
||||||
|
"os.environ[\"ZENROWS_API_KEY\"] = \"<YOUR_ZENROWS_API_KEY>\"\n",
|
||||||
|
"\n",
|
||||||
|
"zenrows_scraper_tool = ZenRowsUniversalScraper()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {
|
||||||
|
"id": "cUal-Ioic_0k"
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"You can also pass the ZenRows API key when initializing the ZenRowsUniversalScraper tool."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 4,
|
||||||
|
"metadata": {
|
||||||
|
"id": "sPd95HKzdCGr"
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from langchain_zenrows import ZenRowsUniversalScraper\n",
|
||||||
|
"\n",
|
||||||
|
"zenrows_scraper_tool = ZenRowsUniversalScraper(zenrows_api_key=\"your-api-key\")"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {
|
||||||
|
"id": "c8rEvAY4dFX2"
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"## Invocation\n",
|
||||||
|
"\n",
|
||||||
|
"### Basic Usage\n",
|
||||||
|
"\n",
|
||||||
|
"The tool accepts a URL and various optional parameters to customize the scraping behavior:"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {
|
||||||
|
"id": "GKTDKhXEdGku"
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import os\n",
|
||||||
|
"\n",
|
||||||
|
"from langchain_zenrows import ZenRowsUniversalScraper\n",
|
||||||
|
"\n",
|
||||||
|
"# Set your ZenRows API key\n",
|
||||||
|
"os.environ[\"ZENROWS_API_KEY\"] = \"<YOUR_ZENROWS_API_KEY>\"\n",
|
||||||
|
"\n",
|
||||||
|
"# Initialize the tool\n",
|
||||||
|
"zenrows_scraper_tool = ZenRowsUniversalScraper()\n",
|
||||||
|
"\n",
|
||||||
|
"# Scrape a simple webpage\n",
|
||||||
|
"result = zenrows_scraper_tool.invoke({\"url\": \"https://httpbin.io/html\"})\n",
|
||||||
|
"print(result)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {
|
||||||
|
"id": "7Kd1loN5dJbt"
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"### Advanced Usage with Parameters"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {
|
||||||
|
"id": "NfJOQdBhdLrp"
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import os\n",
|
||||||
|
"\n",
|
||||||
|
"from langchain_zenrows import ZenRowsUniversalScraper\n",
|
||||||
|
"\n",
|
||||||
|
"# Set your ZenRows API key\n",
|
||||||
|
"os.environ[\"ZENROWS_API_KEY\"] = \"<YOUR_ZENROWS_API_KEY>\"\n",
|
||||||
|
"\n",
|
||||||
|
"zenrows_scraper_tool = ZenRowsUniversalScraper()\n",
|
||||||
|
"\n",
|
||||||
|
"# Scrape with JavaScript rendering and premium proxies\n",
|
||||||
|
"result = zenrows_scraper_tool.invoke(\n",
|
||||||
|
" {\n",
|
||||||
|
" \"url\": \"https://www.scrapingcourse.com/ecommerce/\",\n",
|
||||||
|
" \"js_render\": True,\n",
|
||||||
|
" \"premium_proxy\": True,\n",
|
||||||
|
" \"proxy_country\": \"us\",\n",
|
||||||
|
" \"response_type\": \"markdown\",\n",
|
||||||
|
" \"wait\": 2000,\n",
|
||||||
|
" }\n",
|
||||||
|
")\n",
|
||||||
|
"\n",
|
||||||
|
"print(result)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {
|
||||||
|
"id": "8eivshtqdNe0"
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"### Use within an agent"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {
|
||||||
|
"id": "JmbPF7xadPgK"
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import os\n",
|
||||||
|
"\n",
|
||||||
|
"from langchain_openai import ChatOpenAI # or your preferred LLM\n",
|
||||||
|
"from langchain_zenrows import ZenRowsUniversalScraper\n",
|
||||||
|
"from langgraph.prebuilt import create_react_agent\n",
|
||||||
|
"\n",
|
||||||
|
"# Set your ZenRows and OpenAI API keys\n",
|
||||||
|
"os.environ[\"ZENROWS_API_KEY\"] = \"<YOUR_ZENROWS_API_KEY>\"\n",
|
||||||
|
"os.environ[\"OPENAI_API_KEY\"] = \"<YOUR_OPEN_AI_API_KEY>\"\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"# Initialize components\n",
|
||||||
|
"llm = ChatOpenAI(model=\"gpt-4o-mini\")\n",
|
||||||
|
"zenrows_scraper_tool = ZenRowsUniversalScraper()\n",
|
||||||
|
"\n",
|
||||||
|
"# Create agent\n",
|
||||||
|
"agent = create_react_agent(llm, [zenrows_scraper_tool])\n",
|
||||||
|
"\n",
|
||||||
|
"# Use the agent\n",
|
||||||
|
"result = agent.invoke(\n",
|
||||||
|
" {\n",
|
||||||
|
" \"messages\": \"Scrape https://news.ycombinator.com/ and list the top 3 stories with title, points, comments, username, and time.\"\n",
|
||||||
|
" }\n",
|
||||||
|
")\n",
|
||||||
|
"\n",
|
||||||
|
"print(\"Agent Response:\")\n",
|
||||||
|
"for message in result[\"messages\"]:\n",
|
||||||
|
" print(f\"{message.content}\")"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {
|
||||||
|
"id": "k9lqlhoAdRSb"
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"## API reference\n",
|
||||||
|
"\n",
|
||||||
|
"For detailed documentation of all ZenRowsUniversalScraper features and configurations head to the [**ZenRowsUniversalScraper API reference**](https://github.com/ZenRows-Hub/langchain-zenrows).\n",
|
||||||
|
"\n",
|
||||||
|
"For comprehensive information about the underlying API parameters and capabilities, see the [ZenRows Universal API documentation](https://docs.zenrows.com/universal-scraper-api/api-reference)."
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"colab": {
|
||||||
|
"provenance": []
|
||||||
|
},
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3",
|
||||||
|
"name": "python3"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"name": "python"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 0
|
||||||
|
}
|
@@ -729,8 +729,10 @@ packages:
|
|||||||
- name: langchain-scrapeless
|
- name: langchain-scrapeless
|
||||||
repo: scrapeless-ai/langchain-scrapeless
|
repo: scrapeless-ai/langchain-scrapeless
|
||||||
path: .
|
path: .
|
||||||
|
- name: langchain-zenrows
|
||||||
|
path: .
|
||||||
|
repo: ZenRows-Hub/langchain-zenrows
|
||||||
- name: langchain-oci
|
- name: langchain-oci
|
||||||
name_title: Oracle Cloud Infrastructure (OCI)
|
name_title: Oracle Cloud Infrastructure (OCI)
|
||||||
repo: oracle/langchain-oracle
|
repo: oracle/langchain-oracle
|
||||||
path: .
|
path: .
|
||||||
|
|
||||||
|
Reference in New Issue
Block a user