langchain/docs/docs/integrations/tools/zenrows_universal_scraper.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "FVo_qZB6crBs"
   },
   "source": [
    "# ZenRowsUniversalScraper\n",
    "\n",
    "[ZenRows](https://www.zenrows.com/) is an enterprise-grade web scraping tool that provides advanced web data extraction capabilities at scale. For more information about ZenRows and its Universal Scraper API, visit the [official documentation](https://docs.zenrows.com/universal-scraper-api/).\n",
    "\n",
    "This document provides a quick overview for getting started with ZenRowsUniversalScraper tool. For detailed documentation of all ZenRowsUniversalScraper features and configurations head to the [API reference](https://github.com/ZenRows-Hub/langchain-zenrows?tab=readme-ov-file#api-reference).\n",
    "\n",
    "## Overview\n",
    "\n",
    "### Integration details\n",
    "\n",
    "| Class | Package | JS support |  Package latest |\n",
    "| :--- | :--- | :---: | :---: |\n",
    "| [ZenRowsUniversalScraper](https://pypi.org/project/langchain-zenrows/) | [langchain-zenrows](https://pypi.org/project/langchain-zenrows/) | ❌ |  ![PyPI - Version](https://img.shields.io/pypi/v/langchain-zenrows?style=flat-square&label=%20) |\n",
    "\n",
    "### Tool features\n",
    "\n",
    "| Feature | Support |\n",
    "| :--- | :---: |\n",
    "| **JavaScript Rendering** | ✅ |\n",
    "| **Anti-Bot Bypass** | ✅ |\n",
    "| **Geo-Targeting** | ✅ |\n",
    "| **Multiple Output Formats** | ✅ |\n",
    "| **CSS Extraction** | ✅ |\n",
    "| **Screenshot Capture** | ✅ |\n",
    "| **Session Management** | ✅ |\n",
    "| **Premium Proxies** | ✅ |\n",
    "\n",
    "## Setup\n",
    "\n",
    "Install the required dependencies."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "id": "henNSgOlcww5"
   },
   "outputs": [],
   "source": [
    "pip install langchain-zenrows"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "IS2yw_UaczgP"
   },
   "source": [
    "### Credentials\n",
    "\n",
    "You'll need a ZenRows API key to use this tool. You can sign up for free at [ZenRows](https://app.zenrows.com/register?prod=universal_scraper)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "id": "Z097qruic2iH"
   },
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "# Set your ZenRows API key\n",
    "os.environ[\"ZENROWS_API_KEY\"] = \"<YOUR_ZENROWS_API_KEY>\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "hB7fHgmQc5eh"
   },
   "source": [
    "## Instantiation\n",
    "\n",
    "Here's how to instantiate an instance of the ZenRowsUniversalScraper tool."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "id": "ezdGcI3Hc8H3"
   },
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "from langchain_zenrows import ZenRowsUniversalScraper\n",
    "\n",
    "# Set your ZenRows API key\n",
    "os.environ[\"ZENROWS_API_KEY\"] = \"<YOUR_ZENROWS_API_KEY>\"\n",
    "\n",
    "zenrows_scraper_tool = ZenRowsUniversalScraper()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "cUal-Ioic_0k"
   },
   "source": [
    "You can also pass the ZenRows API key when initializing the ZenRowsUniversalScraper tool."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "id": "sPd95HKzdCGr"
   },
   "outputs": [],
   "source": [
    "from langchain_zenrows import ZenRowsUniversalScraper\n",
    "\n",
    "zenrows_scraper_tool = ZenRowsUniversalScraper(zenrows_api_key=\"your-api-key\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "c8rEvAY4dFX2"
   },
   "source": [
    "## Invocation\n",
    "\n",
    "### Basic Usage\n",
    "\n",
    "The tool accepts a URL and various optional parameters to customize the scraping behavior:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "GKTDKhXEdGku"
   },
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "from langchain_zenrows import ZenRowsUniversalScraper\n",
    "\n",
    "# Set your ZenRows API key\n",
    "os.environ[\"ZENROWS_API_KEY\"] = \"<YOUR_ZENROWS_API_KEY>\"\n",
    "\n",
    "# Initialize the tool\n",
    "zenrows_scraper_tool = ZenRowsUniversalScraper()\n",
    "\n",
    "# Scrape a simple webpage\n",
    "result = zenrows_scraper_tool.invoke({\"url\": \"https://httpbin.io/html\"})\n",
    "print(result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "7Kd1loN5dJbt"
   },
   "source": [
    "### Advanced Usage with Parameters"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "NfJOQdBhdLrp"
   },
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "from langchain_zenrows import ZenRowsUniversalScraper\n",
    "\n",
    "# Set your ZenRows API key\n",
    "os.environ[\"ZENROWS_API_KEY\"] = \"<YOUR_ZENROWS_API_KEY>\"\n",
    "\n",
    "zenrows_scraper_tool = ZenRowsUniversalScraper()\n",
    "\n",
    "# Scrape with JavaScript rendering and premium proxies\n",
    "result = zenrows_scraper_tool.invoke(\n",
    "    {\n",
    "        \"url\": \"https://www.scrapingcourse.com/ecommerce/\",\n",
    "        \"js_render\": True,\n",
    "        \"premium_proxy\": True,\n",
    "        \"proxy_country\": \"us\",\n",
    "        \"response_type\": \"markdown\",\n",
    "        \"wait\": 2000,\n",
    "    }\n",
    ")\n",
    "\n",
    "print(result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "8eivshtqdNe0"
   },
   "source": [
    "### Use within an agent"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "JmbPF7xadPgK"
   },
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "from langchain_openai import ChatOpenAI  # or your preferred LLM\n",
    "from langchain_zenrows import ZenRowsUniversalScraper\n",
    "from langgraph.prebuilt import create_react_agent\n",
    "\n",
    "# Set your ZenRows and OpenAI API keys\n",
    "os.environ[\"ZENROWS_API_KEY\"] = \"<YOUR_ZENROWS_API_KEY>\"\n",
    "os.environ[\"OPENAI_API_KEY\"] = \"<YOUR_OPEN_AI_API_KEY>\"\n",
    "\n",
    "\n",
    "# Initialize components\n",
    "llm = ChatOpenAI(model=\"gpt-4o-mini\")\n",
    "zenrows_scraper_tool = ZenRowsUniversalScraper()\n",
    "\n",
    "# Create agent\n",
    "agent = create_react_agent(llm, [zenrows_scraper_tool])\n",
    "\n",
    "# Use the agent\n",
    "result = agent.invoke(\n",
    "    {\n",
    "        \"messages\": \"Scrape https://news.ycombinator.com/ and list the top 3 stories with title, points, comments, username, and time.\"\n",
    "    }\n",
    ")\n",
    "\n",
    "print(\"Agent Response:\")\n",
    "for message in result[\"messages\"]:\n",
    "    print(f\"{message.content}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "k9lqlhoAdRSb"
   },
   "source": [
    "## API reference\n",
    "\n",
    "For detailed documentation of all ZenRowsUniversalScraper features and configurations head to the [**ZenRowsUniversalScraper API reference**](https://github.com/ZenRows-Hub/langchain-zenrows).\n",
    "\n",
    "For comprehensive information about the underlying API parameters and capabilities, see the [ZenRows Universal API documentation](https://docs.zenrows.com/universal-scraper-api/api-reference)."
   ]
  }
 ],
 "metadata": {
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "name": "python3"
  },
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}