From 3420ca1da23a1ccd88328fe64e3b87ffdc49b09b Mon Sep 17 00:00:00 2001 From: Yuvraj Chandra Date: Sat, 13 Sep 2025 02:07:49 +0530 Subject: [PATCH] docs: add ZenRows provider and tool integration docs (#31742) **Description:** Adds documentation for ZenRows integration with LangChain, including provider overview and detailed tool documentation. ZenRows is an enterprise-grade web scraping solution that enables LangChain agents to extract web content at scale with advanced features like JavaScript rendering, anti-bot bypass, geo-targeting, and multiple output formats. This PR includes: - Provider documentation (`docs/docs/integrations/providers/zenrows.ipynb`) - Tool documentation (`docs/docs/integrations/tools/zenrows_universal_scraper.ipynb`) - Complete usage examples and API reference links **Issue:** N/A **Dependencies:** - [langchain-zenrows](https://github.com/ZenRows-Hub/langchain-zenrows) package (external, available on [PyPI](https://pypi.org/project/langchain-zenrows/)) - No changes to core LangChain dependencies **LinkedIn handle:** https://www.linkedin.com/company/zenrows/ --------- Co-authored-by: Mason Daugherty --- .../docs/integrations/providers/zenrows.ipynb | 68 +++++ .../tools/zenrows_universal_scraper.ipynb | 281 ++++++++++++++++++ libs/packages.yml | 4 +- 3 files changed, 352 insertions(+), 1 deletion(-) create mode 100644 docs/docs/integrations/providers/zenrows.ipynb create mode 100644 docs/docs/integrations/tools/zenrows_universal_scraper.ipynb diff --git a/docs/docs/integrations/providers/zenrows.ipynb b/docs/docs/integrations/providers/zenrows.ipynb new file mode 100644 index 00000000000..7f5c1ffe6a3 --- /dev/null +++ b/docs/docs/integrations/providers/zenrows.ipynb @@ -0,0 +1,68 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "_MFfVhVCa15x" + }, + "source": [ + "# ZenRows\n", + "\n", + "[ZenRows](https://www.zenrows.com/) is an enterprise-grade web scraping tool that provides advanced web data extraction capabilities at scale. ZenRows specializes in scraping modern websites, bypassing anti-bot systems, extracting structured data from any website, rendering JavaScript-heavy content, accessing geo-restricted websites, and more.\n", + "\n", + "[langchain-zenrows](https://pypi.org/project/langchain-zenrows/) provides tools that allow LLMs to access web data using ZenRows' powerful scraping infrastructure.\n", + "\n", + "## Installation and Setup\n", + "\n", + "```bash\n", + "pip install langchain-zenrows\n", + "```\n", + "\n", + "You'll need to set up your ZenRows API key:\n", + "\n", + "```python\n", + "import os\n", + "os.environ[\"ZENROWS_API_KEY\"] = \"your-api-key\"\n", + "```\n", + "\n", + "Or you can pass it directly when initializing tools:\n", + "\n", + "```python\n", + "from langchain_zenrows import ZenRowsUniversalScraper\n", + "zenrows_scraper_tool = ZenRowsUniversalScraper(zenrows_api_key=\"your-api-key\")\n", + "```\n", + "\n", + "## Tools\n", + "\n", + "### ZenRowsUniversalScraper\n", + "\n", + "The ZenRows integration provides comprehensive web scraping features:\n", + "\n", + "- **JavaScript Rendering**: Scrape modern SPAs and dynamic content\n", + "- **Anti-Bot Bypass**: Overcome sophisticated bot detection systems \n", + "- **Geo-Targeting**: Access region-specific content with 190+ countries\n", + "- **Multiple Output Formats**: HTML, Markdown, Plaintext, PDF, Screenshots\n", + "- **CSS Extraction**: Target specific data with CSS selectors\n", + "- **Structured Data Extraction**: Automatically extract emails, phone numbers, links, and more\n", + "- **Session Management**: Maintain consistent sessions across requests\n", + "- **Premium Proxies**: Residential IPs for maximum success rates\n", + "\n", + "See more in the [ZenRows tool documentation](/docs/integrations/tools/zenrows_universal_scraper)." + ] + } + ], + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/docs/docs/integrations/tools/zenrows_universal_scraper.ipynb b/docs/docs/integrations/tools/zenrows_universal_scraper.ipynb new file mode 100644 index 00000000000..54d6364cc2d --- /dev/null +++ b/docs/docs/integrations/tools/zenrows_universal_scraper.ipynb @@ -0,0 +1,281 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "FVo_qZB6crBs" + }, + "source": [ + "# ZenRowsUniversalScraper\n", + "\n", + "[ZenRows](https://www.zenrows.com/) is an enterprise-grade web scraping tool that provides advanced web data extraction capabilities at scale. For more information about ZenRows and its Universal Scraper API, visit the [official documentation](https://docs.zenrows.com/universal-scraper-api/).\n", + "\n", + "This document provides a quick overview for getting started with ZenRowsUniversalScraper tool. For detailed documentation of all ZenRowsUniversalScraper features and configurations head to the [API reference](https://github.com/ZenRows-Hub/langchain-zenrows?tab=readme-ov-file#api-reference).\n", + "\n", + "## Overview\n", + "\n", + "### Integration details\n", + "\n", + "| Class | Package | JS support | Package latest |\n", + "| :--- | :--- | :---: | :---: |\n", + "| [ZenRowsUniversalScraper](https://pypi.org/project/langchain-zenrows/) | [langchain-zenrows](https://pypi.org/project/langchain-zenrows/) | ❌ | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-zenrows?style=flat-square&label=%20) |\n", + "\n", + "### Tool features\n", + "\n", + "| Feature | Support |\n", + "| :--- | :---: |\n", + "| **JavaScript Rendering** | ✅ |\n", + "| **Anti-Bot Bypass** | ✅ |\n", + "| **Geo-Targeting** | ✅ |\n", + "| **Multiple Output Formats** | ✅ |\n", + "| **CSS Extraction** | ✅ |\n", + "| **Screenshot Capture** | ✅ |\n", + "| **Session Management** | ✅ |\n", + "| **Premium Proxies** | ✅ |\n", + "\n", + "## Setup\n", + "\n", + "Install the required dependencies." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true, + "id": "henNSgOlcww5" + }, + "outputs": [], + "source": [ + "pip install langchain-zenrows" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IS2yw_UaczgP" + }, + "source": [ + "### Credentials\n", + "\n", + "You'll need a ZenRows API key to use this tool. You can sign up for free at [ZenRows](https://app.zenrows.com/register?prod=universal_scraper)." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "id": "Z097qruic2iH" + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "# Set your ZenRows API key\n", + "os.environ[\"ZENROWS_API_KEY\"] = \"\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hB7fHgmQc5eh" + }, + "source": [ + "## Instantiation\n", + "\n", + "Here's how to instantiate an instance of the ZenRowsUniversalScraper tool." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "id": "ezdGcI3Hc8H3" + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "from langchain_zenrows import ZenRowsUniversalScraper\n", + "\n", + "# Set your ZenRows API key\n", + "os.environ[\"ZENROWS_API_KEY\"] = \"\"\n", + "\n", + "zenrows_scraper_tool = ZenRowsUniversalScraper()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cUal-Ioic_0k" + }, + "source": [ + "You can also pass the ZenRows API key when initializing the ZenRowsUniversalScraper tool." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "id": "sPd95HKzdCGr" + }, + "outputs": [], + "source": [ + "from langchain_zenrows import ZenRowsUniversalScraper\n", + "\n", + "zenrows_scraper_tool = ZenRowsUniversalScraper(zenrows_api_key=\"your-api-key\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "c8rEvAY4dFX2" + }, + "source": [ + "## Invocation\n", + "\n", + "### Basic Usage\n", + "\n", + "The tool accepts a URL and various optional parameters to customize the scraping behavior:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "GKTDKhXEdGku" + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "from langchain_zenrows import ZenRowsUniversalScraper\n", + "\n", + "# Set your ZenRows API key\n", + "os.environ[\"ZENROWS_API_KEY\"] = \"\"\n", + "\n", + "# Initialize the tool\n", + "zenrows_scraper_tool = ZenRowsUniversalScraper()\n", + "\n", + "# Scrape a simple webpage\n", + "result = zenrows_scraper_tool.invoke({\"url\": \"https://httpbin.io/html\"})\n", + "print(result)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7Kd1loN5dJbt" + }, + "source": [ + "### Advanced Usage with Parameters" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NfJOQdBhdLrp" + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "from langchain_zenrows import ZenRowsUniversalScraper\n", + "\n", + "# Set your ZenRows API key\n", + "os.environ[\"ZENROWS_API_KEY\"] = \"\"\n", + "\n", + "zenrows_scraper_tool = ZenRowsUniversalScraper()\n", + "\n", + "# Scrape with JavaScript rendering and premium proxies\n", + "result = zenrows_scraper_tool.invoke(\n", + " {\n", + " \"url\": \"https://www.scrapingcourse.com/ecommerce/\",\n", + " \"js_render\": True,\n", + " \"premium_proxy\": True,\n", + " \"proxy_country\": \"us\",\n", + " \"response_type\": \"markdown\",\n", + " \"wait\": 2000,\n", + " }\n", + ")\n", + "\n", + "print(result)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8eivshtqdNe0" + }, + "source": [ + "### Use within an agent" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "JmbPF7xadPgK" + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "from langchain_openai import ChatOpenAI # or your preferred LLM\n", + "from langchain_zenrows import ZenRowsUniversalScraper\n", + "from langgraph.prebuilt import create_react_agent\n", + "\n", + "# Set your ZenRows and OpenAI API keys\n", + "os.environ[\"ZENROWS_API_KEY\"] = \"\"\n", + "os.environ[\"OPENAI_API_KEY\"] = \"\"\n", + "\n", + "\n", + "# Initialize components\n", + "llm = ChatOpenAI(model=\"gpt-4o-mini\")\n", + "zenrows_scraper_tool = ZenRowsUniversalScraper()\n", + "\n", + "# Create agent\n", + "agent = create_react_agent(llm, [zenrows_scraper_tool])\n", + "\n", + "# Use the agent\n", + "result = agent.invoke(\n", + " {\n", + " \"messages\": \"Scrape https://news.ycombinator.com/ and list the top 3 stories with title, points, comments, username, and time.\"\n", + " }\n", + ")\n", + "\n", + "print(\"Agent Response:\")\n", + "for message in result[\"messages\"]:\n", + " print(f\"{message.content}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "k9lqlhoAdRSb" + }, + "source": [ + "## API reference\n", + "\n", + "For detailed documentation of all ZenRowsUniversalScraper features and configurations head to the [**ZenRowsUniversalScraper API reference**](https://github.com/ZenRows-Hub/langchain-zenrows).\n", + "\n", + "For comprehensive information about the underlying API parameters and capabilities, see the [ZenRows Universal API documentation](https://docs.zenrows.com/universal-scraper-api/api-reference)." + ] + } + ], + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} \ No newline at end of file diff --git a/libs/packages.yml b/libs/packages.yml index 3e575c1dbcc..5ee7e4aa077 100644 --- a/libs/packages.yml +++ b/libs/packages.yml @@ -729,8 +729,10 @@ packages: - name: langchain-scrapeless repo: scrapeless-ai/langchain-scrapeless path: . +- name: langchain-zenrows + path: . + repo: ZenRows-Hub/langchain-zenrows - name: langchain-oci name_title: Oracle Cloud Infrastructure (OCI) repo: oracle/langchain-oracle path: . -