docs: add ZenRows provider and tool integration docs (#31742)

**Description:** Adds documentation for ZenRows integration with LangChain, including provider overview and detailed tool documentation. ZenRows is an enterprise-grade web scraping solution that enables LangChain agents to extract web content at scale with advanced features like JavaScript rendering, anti-bot bypass, geo-targeting, and multiple output formats. This PR includes: - Provider documentation (`docs/docs/integrations/providers/zenrows.ipynb`) - Tool documentation (`docs/docs/integrations/tools/zenrows_universal_scraper.ipynb`) - Complete usage examples and API reference links **Issue:** N/A **Dependencies:** - [langchain-zenrows](https://github.com/ZenRows-Hub/langchain-zenrows) package (external, available on [PyPI](https://pypi.org/project/langchain-zenrows/)) - No changes to core LangChain dependencies **LinkedIn handle:** https://www.linkedin.com/company/zenrows/ --------- Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-09-20 18:12:35 +00:00 · 2025-09-13 02:07:49 +05:30
parent f11dd177e9
commit 3420ca1da2
3 changed files with 352 additions and 1 deletions
--- a/docs/docs/integrations/providers/zenrows.ipynb
+++ b/docs/docs/integrations/providers/zenrows.ipynb
@@ -0,0 +1,68 @@
 {
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_MFfVhVCa15x"
      },
      "source": [
        "# ZenRows\n",
        "\n",
        "[ZenRows](https://www.zenrows.com/) is an enterprise-grade web scraping tool that provides advanced web data extraction capabilities at scale. ZenRows specializes in scraping modern websites, bypassing anti-bot systems, extracting structured data from any website, rendering JavaScript-heavy content, accessing geo-restricted websites, and more.\n",
        "\n",
        "[langchain-zenrows](https://pypi.org/project/langchain-zenrows/) provides tools that allow LLMs to access web data using ZenRows' powerful scraping infrastructure.\n",
        "\n",
        "## Installation and Setup\n",
        "\n",
        "```bash\n",
        "pip install langchain-zenrows\n",
        "```\n",
        "\n",
        "You'll need to set up your ZenRows API key:\n",
        "\n",
        "```python\n",
        "import os\n",
        "os.environ[\"ZENROWS_API_KEY\"] = \"your-api-key\"\n",
        "```\n",
        "\n",
        "Or you can pass it directly when initializing tools:\n",
        "\n",
        "```python\n",
        "from langchain_zenrows import ZenRowsUniversalScraper\n",
        "zenrows_scraper_tool = ZenRowsUniversalScraper(zenrows_api_key=\"your-api-key\")\n",
        "```\n",
        "\n",
        "## Tools\n",
        "\n",
        "### ZenRowsUniversalScraper\n",
        "\n",
        "The ZenRows integration provides comprehensive web scraping features:\n",
        "\n",
        "- **JavaScript Rendering**: Scrape modern SPAs and dynamic content\n",
        "- **Anti-Bot Bypass**: Overcome sophisticated bot detection systems  \n",
        "- **Geo-Targeting**: Access region-specific content with 190+ countries\n",
        "- **Multiple Output Formats**: HTML, Markdown, Plaintext, PDF, Screenshots\n",
        "- **CSS Extraction**: Target specific data with CSS selectors\n",
        "- **Structured Data Extraction**: Automatically extract emails, phone numbers, links, and more\n",
        "- **Session Management**: Maintain consistent sessions across requests\n",
        "- **Premium Proxies**: Residential IPs for maximum success rates\n",
        "\n",
        "See more in the [ZenRows tool documentation](/docs/integrations/tools/zenrows_universal_scraper)."
      ]
    }
  ],
  "metadata": {
    "colab": {
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
 }
--- a/docs/docs/integrations/tools/zenrows_universal_scraper.ipynb
+++ b/docs/docs/integrations/tools/zenrows_universal_scraper.ipynb
@@ -0,0 +1,281 @@
 {
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "FVo_qZB6crBs"
   },
   "source": [
    "# ZenRowsUniversalScraper\n",
    "\n",
    "[ZenRows](https://www.zenrows.com/) is an enterprise-grade web scraping tool that provides advanced web data extraction capabilities at scale. For more information about ZenRows and its Universal Scraper API, visit the [official documentation](https://docs.zenrows.com/universal-scraper-api/).\n",
    "\n",
    "This document provides a quick overview for getting started with ZenRowsUniversalScraper tool. For detailed documentation of all ZenRowsUniversalScraper features and configurations head to the [API reference](https://github.com/ZenRows-Hub/langchain-zenrows?tab=readme-ov-file#api-reference).\n",
    "\n",
    "## Overview\n",
    "\n",
    "### Integration details\n",
    "\n",
    "| Class | Package | JS support |  Package latest |\n",
    "| :--- | :--- | :---: | :---: |\n",
    "| [ZenRowsUniversalScraper](https://pypi.org/project/langchain-zenrows/) | [langchain-zenrows](https://pypi.org/project/langchain-zenrows/) | ❌ |  ![PyPI - Version](https://img.shields.io/pypi/v/langchain-zenrows?style=flat-square&label=%20) |\n",
    "\n",
    "### Tool features\n",
    "\n",
    "| Feature | Support |\n",
    "| :--- | :---: |\n",
    "| **JavaScript Rendering** | ✅ |\n",
    "| **Anti-Bot Bypass** | ✅ |\n",
    "| **Geo-Targeting** | ✅ |\n",
    "| **Multiple Output Formats** | ✅ |\n",
    "| **CSS Extraction** | ✅ |\n",
    "| **Screenshot Capture** | ✅ |\n",
    "| **Session Management** | ✅ |\n",
    "| **Premium Proxies** | ✅ |\n",
    "\n",
    "## Setup\n",
    "\n",
    "Install the required dependencies."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "id": "henNSgOlcww5"
   },
   "outputs": [],
   "source": [
    "pip install langchain-zenrows"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "IS2yw_UaczgP"
   },
   "source": [
    "### Credentials\n",
    "\n",
    "You'll need a ZenRows API key to use this tool. You can sign up for free at [ZenRows](https://app.zenrows.com/register?prod=universal_scraper)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "id": "Z097qruic2iH"
   },
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "# Set your ZenRows API key\n",
    "os.environ[\"ZENROWS_API_KEY\"] = \"<YOUR_ZENROWS_API_KEY>\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "hB7fHgmQc5eh"
   },
   "source": [
    "## Instantiation\n",
    "\n",
    "Here's how to instantiate an instance of the ZenRowsUniversalScraper tool."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "id": "ezdGcI3Hc8H3"
   },
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "from langchain_zenrows import ZenRowsUniversalScraper\n",
    "\n",
    "# Set your ZenRows API key\n",
    "os.environ[\"ZENROWS_API_KEY\"] = \"<YOUR_ZENROWS_API_KEY>\"\n",
    "\n",
    "zenrows_scraper_tool = ZenRowsUniversalScraper()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "cUal-Ioic_0k"
   },
   "source": [
    "You can also pass the ZenRows API key when initializing the ZenRowsUniversalScraper tool."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "id": "sPd95HKzdCGr"
   },
   "outputs": [],
   "source": [
    "from langchain_zenrows import ZenRowsUniversalScraper\n",
    "\n",
    "zenrows_scraper_tool = ZenRowsUniversalScraper(zenrows_api_key=\"your-api-key\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "c8rEvAY4dFX2"
   },
   "source": [
    "## Invocation\n",
    "\n",
    "### Basic Usage\n",
    "\n",
    "The tool accepts a URL and various optional parameters to customize the scraping behavior:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "GKTDKhXEdGku"
   },
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "from langchain_zenrows import ZenRowsUniversalScraper\n",
    "\n",
    "# Set your ZenRows API key\n",
    "os.environ[\"ZENROWS_API_KEY\"] = \"<YOUR_ZENROWS_API_KEY>\"\n",
    "\n",
    "# Initialize the tool\n",
    "zenrows_scraper_tool = ZenRowsUniversalScraper()\n",
    "\n",
    "# Scrape a simple webpage\n",
    "result = zenrows_scraper_tool.invoke({\"url\": \"https://httpbin.io/html\"})\n",
    "print(result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "7Kd1loN5dJbt"
   },
   "source": [
    "### Advanced Usage with Parameters"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "NfJOQdBhdLrp"
   },
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "from langchain_zenrows import ZenRowsUniversalScraper\n",
    "\n",
    "# Set your ZenRows API key\n",
    "os.environ[\"ZENROWS_API_KEY\"] = \"<YOUR_ZENROWS_API_KEY>\"\n",
    "\n",
    "zenrows_scraper_tool = ZenRowsUniversalScraper()\n",
    "\n",
    "# Scrape with JavaScript rendering and premium proxies\n",
    "result = zenrows_scraper_tool.invoke(\n",
    "    {\n",
    "        \"url\": \"https://www.scrapingcourse.com/ecommerce/\",\n",
    "        \"js_render\": True,\n",
    "        \"premium_proxy\": True,\n",
    "        \"proxy_country\": \"us\",\n",
    "        \"response_type\": \"markdown\",\n",
    "        \"wait\": 2000,\n",
    "    }\n",
    ")\n",
    "\n",
    "print(result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "8eivshtqdNe0"
   },
   "source": [
    "### Use within an agent"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "JmbPF7xadPgK"
   },
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "from langchain_openai import ChatOpenAI  # or your preferred LLM\n",
    "from langchain_zenrows import ZenRowsUniversalScraper\n",
    "from langgraph.prebuilt import create_react_agent\n",
    "\n",
    "# Set your ZenRows and OpenAI API keys\n",
    "os.environ[\"ZENROWS_API_KEY\"] = \"<YOUR_ZENROWS_API_KEY>\"\n",
    "os.environ[\"OPENAI_API_KEY\"] = \"<YOUR_OPEN_AI_API_KEY>\"\n",
    "\n",
    "\n",
    "# Initialize components\n",
    "llm = ChatOpenAI(model=\"gpt-4o-mini\")\n",
    "zenrows_scraper_tool = ZenRowsUniversalScraper()\n",
    "\n",
    "# Create agent\n",
    "agent = create_react_agent(llm, [zenrows_scraper_tool])\n",
    "\n",
    "# Use the agent\n",
    "result = agent.invoke(\n",
    "    {\n",
    "        \"messages\": \"Scrape https://news.ycombinator.com/ and list the top 3 stories with title, points, comments, username, and time.\"\n",
    "    }\n",
    ")\n",
    "\n",
    "print(\"Agent Response:\")\n",
    "for message in result[\"messages\"]:\n",
    "    print(f\"{message.content}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "k9lqlhoAdRSb"
   },
   "source": [
    "## API reference\n",
    "\n",
    "For detailed documentation of all ZenRowsUniversalScraper features and configurations head to the [**ZenRowsUniversalScraper API reference**](https://github.com/ZenRows-Hub/langchain-zenrows).\n",
    "\n",
    "For comprehensive information about the underlying API parameters and capabilities, see the [ZenRows Universal API documentation](https://docs.zenrows.com/universal-scraper-api/api-reference)."
   ]
  }
 ],
 "metadata": {
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "name": "python3"
  },
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
 }
--- a/libs/packages.yml
+++ b/libs/packages.yml
@@ -729,8 +729,10 @@ packages:
 - name: langchain-scrapeless
  repo: scrapeless-ai/langchain-scrapeless
  path: .
 - name: langchain-zenrows
  path: .
  repo: ZenRows-Hub/langchain-zenrows
 - name: langchain-oci
  name_title: Oracle Cloud Infrastructure (OCI)
  repo: oracle/langchain-oracle
  path: .