mirror of
https://github.com/hwchase17/langchain.git
synced 2026-02-21 14:43:07 +00:00
**Description:** Adds documentation for ZenRows integration with LangChain, including provider overview and detailed tool documentation. ZenRows is an enterprise-grade web scraping solution that enables LangChain agents to extract web content at scale with advanced features like JavaScript rendering, anti-bot bypass, geo-targeting, and multiple output formats. This PR includes: - Provider documentation (`docs/docs/integrations/providers/zenrows.ipynb`) - Tool documentation (`docs/docs/integrations/tools/zenrows_universal_scraper.ipynb`) - Complete usage examples and API reference links **Issue:** N/A **Dependencies:** - [langchain-zenrows](https://github.com/ZenRows-Hub/langchain-zenrows) package (external, available on [PyPI](https://pypi.org/project/langchain-zenrows/)) - No changes to core LangChain dependencies **LinkedIn handle:** https://www.linkedin.com/company/zenrows/ --------- Co-authored-by: Mason Daugherty <mason@langchain.dev>
281 lines
7.5 KiB
Plaintext
281 lines
7.5 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"id": "FVo_qZB6crBs"
|
|
},
|
|
"source": [
|
|
"# ZenRowsUniversalScraper\n",
|
|
"\n",
|
|
"[ZenRows](https://www.zenrows.com/) is an enterprise-grade web scraping tool that provides advanced web data extraction capabilities at scale. For more information about ZenRows and its Universal Scraper API, visit the [official documentation](https://docs.zenrows.com/universal-scraper-api/).\n",
|
|
"\n",
|
|
"This document provides a quick overview for getting started with ZenRowsUniversalScraper tool. For detailed documentation of all ZenRowsUniversalScraper features and configurations head to the [API reference](https://github.com/ZenRows-Hub/langchain-zenrows?tab=readme-ov-file#api-reference).\n",
|
|
"\n",
|
|
"## Overview\n",
|
|
"\n",
|
|
"### Integration details\n",
|
|
"\n",
|
|
"| Class | Package | JS support | Package latest |\n",
|
|
"| :--- | :--- | :---: | :---: |\n",
|
|
"| [ZenRowsUniversalScraper](https://pypi.org/project/langchain-zenrows/) | [langchain-zenrows](https://pypi.org/project/langchain-zenrows/) | ❌ |  |\n",
|
|
"\n",
|
|
"### Tool features\n",
|
|
"\n",
|
|
"| Feature | Support |\n",
|
|
"| :--- | :---: |\n",
|
|
"| **JavaScript Rendering** | ✅ |\n",
|
|
"| **Anti-Bot Bypass** | ✅ |\n",
|
|
"| **Geo-Targeting** | ✅ |\n",
|
|
"| **Multiple Output Formats** | ✅ |\n",
|
|
"| **CSS Extraction** | ✅ |\n",
|
|
"| **Screenshot Capture** | ✅ |\n",
|
|
"| **Session Management** | ✅ |\n",
|
|
"| **Premium Proxies** | ✅ |\n",
|
|
"\n",
|
|
"## Setup\n",
|
|
"\n",
|
|
"Install the required dependencies."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"collapsed": true,
|
|
"id": "henNSgOlcww5"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"pip install langchain-zenrows"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"id": "IS2yw_UaczgP"
|
|
},
|
|
"source": [
|
|
"### Credentials\n",
|
|
"\n",
|
|
"You'll need a ZenRows API key to use this tool. You can sign up for free at [ZenRows](https://app.zenrows.com/register?prod=universal_scraper)."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"metadata": {
|
|
"id": "Z097qruic2iH"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"import os\n",
|
|
"\n",
|
|
"# Set your ZenRows API key\n",
|
|
"os.environ[\"ZENROWS_API_KEY\"] = \"<YOUR_ZENROWS_API_KEY>\""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"id": "hB7fHgmQc5eh"
|
|
},
|
|
"source": [
|
|
"## Instantiation\n",
|
|
"\n",
|
|
"Here's how to instantiate an instance of the ZenRowsUniversalScraper tool."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"metadata": {
|
|
"id": "ezdGcI3Hc8H3"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"import os\n",
|
|
"\n",
|
|
"from langchain_zenrows import ZenRowsUniversalScraper\n",
|
|
"\n",
|
|
"# Set your ZenRows API key\n",
|
|
"os.environ[\"ZENROWS_API_KEY\"] = \"<YOUR_ZENROWS_API_KEY>\"\n",
|
|
"\n",
|
|
"zenrows_scraper_tool = ZenRowsUniversalScraper()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"id": "cUal-Ioic_0k"
|
|
},
|
|
"source": [
|
|
"You can also pass the ZenRows API key when initializing the ZenRowsUniversalScraper tool."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"metadata": {
|
|
"id": "sPd95HKzdCGr"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain_zenrows import ZenRowsUniversalScraper\n",
|
|
"\n",
|
|
"zenrows_scraper_tool = ZenRowsUniversalScraper(zenrows_api_key=\"your-api-key\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"id": "c8rEvAY4dFX2"
|
|
},
|
|
"source": [
|
|
"## Invocation\n",
|
|
"\n",
|
|
"### Basic Usage\n",
|
|
"\n",
|
|
"The tool accepts a URL and various optional parameters to customize the scraping behavior:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"id": "GKTDKhXEdGku"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"import os\n",
|
|
"\n",
|
|
"from langchain_zenrows import ZenRowsUniversalScraper\n",
|
|
"\n",
|
|
"# Set your ZenRows API key\n",
|
|
"os.environ[\"ZENROWS_API_KEY\"] = \"<YOUR_ZENROWS_API_KEY>\"\n",
|
|
"\n",
|
|
"# Initialize the tool\n",
|
|
"zenrows_scraper_tool = ZenRowsUniversalScraper()\n",
|
|
"\n",
|
|
"# Scrape a simple webpage\n",
|
|
"result = zenrows_scraper_tool.invoke({\"url\": \"https://httpbin.io/html\"})\n",
|
|
"print(result)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"id": "7Kd1loN5dJbt"
|
|
},
|
|
"source": [
|
|
"### Advanced Usage with Parameters"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"id": "NfJOQdBhdLrp"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"import os\n",
|
|
"\n",
|
|
"from langchain_zenrows import ZenRowsUniversalScraper\n",
|
|
"\n",
|
|
"# Set your ZenRows API key\n",
|
|
"os.environ[\"ZENROWS_API_KEY\"] = \"<YOUR_ZENROWS_API_KEY>\"\n",
|
|
"\n",
|
|
"zenrows_scraper_tool = ZenRowsUniversalScraper()\n",
|
|
"\n",
|
|
"# Scrape with JavaScript rendering and premium proxies\n",
|
|
"result = zenrows_scraper_tool.invoke(\n",
|
|
" {\n",
|
|
" \"url\": \"https://www.scrapingcourse.com/ecommerce/\",\n",
|
|
" \"js_render\": True,\n",
|
|
" \"premium_proxy\": True,\n",
|
|
" \"proxy_country\": \"us\",\n",
|
|
" \"response_type\": \"markdown\",\n",
|
|
" \"wait\": 2000,\n",
|
|
" }\n",
|
|
")\n",
|
|
"\n",
|
|
"print(result)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"id": "8eivshtqdNe0"
|
|
},
|
|
"source": [
|
|
"### Use within an agent"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"id": "JmbPF7xadPgK"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"import os\n",
|
|
"\n",
|
|
"from langchain_openai import ChatOpenAI # or your preferred LLM\n",
|
|
"from langchain_zenrows import ZenRowsUniversalScraper\n",
|
|
"from langgraph.prebuilt import create_react_agent\n",
|
|
"\n",
|
|
"# Set your ZenRows and OpenAI API keys\n",
|
|
"os.environ[\"ZENROWS_API_KEY\"] = \"<YOUR_ZENROWS_API_KEY>\"\n",
|
|
"os.environ[\"OPENAI_API_KEY\"] = \"<YOUR_OPEN_AI_API_KEY>\"\n",
|
|
"\n",
|
|
"\n",
|
|
"# Initialize components\n",
|
|
"llm = ChatOpenAI(model=\"gpt-4o-mini\")\n",
|
|
"zenrows_scraper_tool = ZenRowsUniversalScraper()\n",
|
|
"\n",
|
|
"# Create agent\n",
|
|
"agent = create_react_agent(llm, [zenrows_scraper_tool])\n",
|
|
"\n",
|
|
"# Use the agent\n",
|
|
"result = agent.invoke(\n",
|
|
" {\n",
|
|
" \"messages\": \"Scrape https://news.ycombinator.com/ and list the top 3 stories with title, points, comments, username, and time.\"\n",
|
|
" }\n",
|
|
")\n",
|
|
"\n",
|
|
"print(\"Agent Response:\")\n",
|
|
"for message in result[\"messages\"]:\n",
|
|
" print(f\"{message.content}\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"id": "k9lqlhoAdRSb"
|
|
},
|
|
"source": [
|
|
"## API reference\n",
|
|
"\n",
|
|
"For detailed documentation of all ZenRowsUniversalScraper features and configurations head to the [**ZenRowsUniversalScraper API reference**](https://github.com/ZenRows-Hub/langchain-zenrows).\n",
|
|
"\n",
|
|
"For comprehensive information about the underlying API parameters and capabilities, see the [ZenRows Universal API documentation](https://docs.zenrows.com/universal-scraper-api/api-reference)."
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"colab": {
|
|
"provenance": []
|
|
},
|
|
"kernelspec": {
|
|
"display_name": "Python 3",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"name": "python"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 0
|
|
} |