docs: add ZenRows provider and tool integration docs (#31742)

**Description:** Adds documentation for ZenRows integration with
LangChain, including provider overview and detailed tool documentation.
ZenRows is an enterprise-grade web scraping solution that enables
LangChain agents to extract web content at scale with advanced features
like JavaScript rendering, anti-bot bypass, geo-targeting, and multiple
output formats.

This PR includes:
- Provider documentation
(`docs/docs/integrations/providers/zenrows.ipynb`)
- Tool documentation
(`docs/docs/integrations/tools/zenrows_universal_scraper.ipynb`)
- Complete usage examples and API reference links

**Issue:** N/A

**Dependencies:** 
- [langchain-zenrows](https://github.com/ZenRows-Hub/langchain-zenrows)
package (external, available on
[PyPI](https://pypi.org/project/langchain-zenrows/))
- No changes to core LangChain dependencies

**LinkedIn handle:** https://www.linkedin.com/company/zenrows/

---------

Co-authored-by: Mason Daugherty <mason@langchain.dev>
This commit is contained in:
Yuvraj Chandra
2025-09-13 02:07:49 +05:30
committed by GitHub
parent f11dd177e9
commit 3420ca1da2
3 changed files with 352 additions and 1 deletions

View File

@@ -0,0 +1,68 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "_MFfVhVCa15x"
},
"source": [
"# ZenRows\n",
"\n",
"[ZenRows](https://www.zenrows.com/) is an enterprise-grade web scraping tool that provides advanced web data extraction capabilities at scale. ZenRows specializes in scraping modern websites, bypassing anti-bot systems, extracting structured data from any website, rendering JavaScript-heavy content, accessing geo-restricted websites, and more.\n",
"\n",
"[langchain-zenrows](https://pypi.org/project/langchain-zenrows/) provides tools that allow LLMs to access web data using ZenRows' powerful scraping infrastructure.\n",
"\n",
"## Installation and Setup\n",
"\n",
"```bash\n",
"pip install langchain-zenrows\n",
"```\n",
"\n",
"You'll need to set up your ZenRows API key:\n",
"\n",
"```python\n",
"import os\n",
"os.environ[\"ZENROWS_API_KEY\"] = \"your-api-key\"\n",
"```\n",
"\n",
"Or you can pass it directly when initializing tools:\n",
"\n",
"```python\n",
"from langchain_zenrows import ZenRowsUniversalScraper\n",
"zenrows_scraper_tool = ZenRowsUniversalScraper(zenrows_api_key=\"your-api-key\")\n",
"```\n",
"\n",
"## Tools\n",
"\n",
"### ZenRowsUniversalScraper\n",
"\n",
"The ZenRows integration provides comprehensive web scraping features:\n",
"\n",
"- **JavaScript Rendering**: Scrape modern SPAs and dynamic content\n",
"- **Anti-Bot Bypass**: Overcome sophisticated bot detection systems \n",
"- **Geo-Targeting**: Access region-specific content with 190+ countries\n",
"- **Multiple Output Formats**: HTML, Markdown, Plaintext, PDF, Screenshots\n",
"- **CSS Extraction**: Target specific data with CSS selectors\n",
"- **Structured Data Extraction**: Automatically extract emails, phone numbers, links, and more\n",
"- **Session Management**: Maintain consistent sessions across requests\n",
"- **Premium Proxies**: Residential IPs for maximum success rates\n",
"\n",
"See more in the [ZenRows tool documentation](/docs/integrations/tools/zenrows_universal_scraper)."
]
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 0
}

View File

@@ -0,0 +1,281 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "FVo_qZB6crBs"
},
"source": [
"# ZenRowsUniversalScraper\n",
"\n",
"[ZenRows](https://www.zenrows.com/) is an enterprise-grade web scraping tool that provides advanced web data extraction capabilities at scale. For more information about ZenRows and its Universal Scraper API, visit the [official documentation](https://docs.zenrows.com/universal-scraper-api/).\n",
"\n",
"This document provides a quick overview for getting started with ZenRowsUniversalScraper tool. For detailed documentation of all ZenRowsUniversalScraper features and configurations head to the [API reference](https://github.com/ZenRows-Hub/langchain-zenrows?tab=readme-ov-file#api-reference).\n",
"\n",
"## Overview\n",
"\n",
"### Integration details\n",
"\n",
"| Class | Package | JS support | Package latest |\n",
"| :--- | :--- | :---: | :---: |\n",
"| [ZenRowsUniversalScraper](https://pypi.org/project/langchain-zenrows/) | [langchain-zenrows](https://pypi.org/project/langchain-zenrows/) | ❌ | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-zenrows?style=flat-square&label=%20) |\n",
"\n",
"### Tool features\n",
"\n",
"| Feature | Support |\n",
"| :--- | :---: |\n",
"| **JavaScript Rendering** | ✅ |\n",
"| **Anti-Bot Bypass** | ✅ |\n",
"| **Geo-Targeting** | ✅ |\n",
"| **Multiple Output Formats** | ✅ |\n",
"| **CSS Extraction** | ✅ |\n",
"| **Screenshot Capture** | ✅ |\n",
"| **Session Management** | ✅ |\n",
"| **Premium Proxies** | ✅ |\n",
"\n",
"## Setup\n",
"\n",
"Install the required dependencies."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true,
"id": "henNSgOlcww5"
},
"outputs": [],
"source": [
"pip install langchain-zenrows"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "IS2yw_UaczgP"
},
"source": [
"### Credentials\n",
"\n",
"You'll need a ZenRows API key to use this tool. You can sign up for free at [ZenRows](https://app.zenrows.com/register?prod=universal_scraper)."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"id": "Z097qruic2iH"
},
"outputs": [],
"source": [
"import os\n",
"\n",
"# Set your ZenRows API key\n",
"os.environ[\"ZENROWS_API_KEY\"] = \"<YOUR_ZENROWS_API_KEY>\""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "hB7fHgmQc5eh"
},
"source": [
"## Instantiation\n",
"\n",
"Here's how to instantiate an instance of the ZenRowsUniversalScraper tool."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"id": "ezdGcI3Hc8H3"
},
"outputs": [],
"source": [
"import os\n",
"\n",
"from langchain_zenrows import ZenRowsUniversalScraper\n",
"\n",
"# Set your ZenRows API key\n",
"os.environ[\"ZENROWS_API_KEY\"] = \"<YOUR_ZENROWS_API_KEY>\"\n",
"\n",
"zenrows_scraper_tool = ZenRowsUniversalScraper()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "cUal-Ioic_0k"
},
"source": [
"You can also pass the ZenRows API key when initializing the ZenRowsUniversalScraper tool."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"id": "sPd95HKzdCGr"
},
"outputs": [],
"source": [
"from langchain_zenrows import ZenRowsUniversalScraper\n",
"\n",
"zenrows_scraper_tool = ZenRowsUniversalScraper(zenrows_api_key=\"your-api-key\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "c8rEvAY4dFX2"
},
"source": [
"## Invocation\n",
"\n",
"### Basic Usage\n",
"\n",
"The tool accepts a URL and various optional parameters to customize the scraping behavior:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "GKTDKhXEdGku"
},
"outputs": [],
"source": [
"import os\n",
"\n",
"from langchain_zenrows import ZenRowsUniversalScraper\n",
"\n",
"# Set your ZenRows API key\n",
"os.environ[\"ZENROWS_API_KEY\"] = \"<YOUR_ZENROWS_API_KEY>\"\n",
"\n",
"# Initialize the tool\n",
"zenrows_scraper_tool = ZenRowsUniversalScraper()\n",
"\n",
"# Scrape a simple webpage\n",
"result = zenrows_scraper_tool.invoke({\"url\": \"https://httpbin.io/html\"})\n",
"print(result)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "7Kd1loN5dJbt"
},
"source": [
"### Advanced Usage with Parameters"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "NfJOQdBhdLrp"
},
"outputs": [],
"source": [
"import os\n",
"\n",
"from langchain_zenrows import ZenRowsUniversalScraper\n",
"\n",
"# Set your ZenRows API key\n",
"os.environ[\"ZENROWS_API_KEY\"] = \"<YOUR_ZENROWS_API_KEY>\"\n",
"\n",
"zenrows_scraper_tool = ZenRowsUniversalScraper()\n",
"\n",
"# Scrape with JavaScript rendering and premium proxies\n",
"result = zenrows_scraper_tool.invoke(\n",
" {\n",
" \"url\": \"https://www.scrapingcourse.com/ecommerce/\",\n",
" \"js_render\": True,\n",
" \"premium_proxy\": True,\n",
" \"proxy_country\": \"us\",\n",
" \"response_type\": \"markdown\",\n",
" \"wait\": 2000,\n",
" }\n",
")\n",
"\n",
"print(result)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "8eivshtqdNe0"
},
"source": [
"### Use within an agent"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "JmbPF7xadPgK"
},
"outputs": [],
"source": [
"import os\n",
"\n",
"from langchain_openai import ChatOpenAI # or your preferred LLM\n",
"from langchain_zenrows import ZenRowsUniversalScraper\n",
"from langgraph.prebuilt import create_react_agent\n",
"\n",
"# Set your ZenRows and OpenAI API keys\n",
"os.environ[\"ZENROWS_API_KEY\"] = \"<YOUR_ZENROWS_API_KEY>\"\n",
"os.environ[\"OPENAI_API_KEY\"] = \"<YOUR_OPEN_AI_API_KEY>\"\n",
"\n",
"\n",
"# Initialize components\n",
"llm = ChatOpenAI(model=\"gpt-4o-mini\")\n",
"zenrows_scraper_tool = ZenRowsUniversalScraper()\n",
"\n",
"# Create agent\n",
"agent = create_react_agent(llm, [zenrows_scraper_tool])\n",
"\n",
"# Use the agent\n",
"result = agent.invoke(\n",
" {\n",
" \"messages\": \"Scrape https://news.ycombinator.com/ and list the top 3 stories with title, points, comments, username, and time.\"\n",
" }\n",
")\n",
"\n",
"print(\"Agent Response:\")\n",
"for message in result[\"messages\"]:\n",
" print(f\"{message.content}\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "k9lqlhoAdRSb"
},
"source": [
"## API reference\n",
"\n",
"For detailed documentation of all ZenRowsUniversalScraper features and configurations head to the [**ZenRowsUniversalScraper API reference**](https://github.com/ZenRows-Hub/langchain-zenrows).\n",
"\n",
"For comprehensive information about the underlying API parameters and capabilities, see the [ZenRows Universal API documentation](https://docs.zenrows.com/universal-scraper-api/api-reference)."
]
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 0
}

View File

@@ -729,8 +729,10 @@ packages:
- name: langchain-scrapeless - name: langchain-scrapeless
repo: scrapeless-ai/langchain-scrapeless repo: scrapeless-ai/langchain-scrapeless
path: . path: .
- name: langchain-zenrows
path: .
repo: ZenRows-Hub/langchain-zenrows
- name: langchain-oci - name: langchain-oci
name_title: Oracle Cloud Infrastructure (OCI) name_title: Oracle Cloud Infrastructure (OCI)
repo: oracle/langchain-oracle repo: oracle/langchain-oracle
path: . path: .