docs: add langchain-scraperapi (#31973)

Adds documentation for the integration langchain-scraperapi, which contains 3 tools using the ScraperAPI service. The tools give AI agents the ability to Scrape the web and return HTML/text/markdown Perform Google search and return json output Perform Amazon search and return json output For reference, here is the official repo for langchain_scraperapi: https://github.com/scraperapi/langchain-scraperapi
2025-09-17 15:35:14 +00:00 · 2025-09-17 04:46:20 +03:00
parent f8640630d8
commit 543d90e108
3 changed files with 404 additions and 0 deletions
--- a/docs/docs/integrations/providers/scraperapi.ipynb
+++ b/docs/docs/integrations/providers/scraperapi.ipynb
@@ -0,0 +1,72 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# ScraperAPI\n",
+    "\n",
+    "[ScraperAPI](https://www.scraperapi.com/) enables data collection from any public website with its web scraping API, without worrying about proxies, browsers, or CAPTCHA handling. [langchain-scraperapi](https://github.com/scraperapi/langchain-scraperapi) wraps this service, making it easy for AI agents to browse the web and scrape data from it.\n",
+    "\n",
+    "## Installation and Setup\n",
+    "\n",
+    "- Install the Python package with `pip install langchain-scraperapi`.\n",
+    "- Obtain an API key from [ScraperAPI](https://www.scraperapi.com/) and set the environment variable `SCRAPERAPI_API_KEY`.\n",
+    "\n",
+    "### Tools\n",
+    "\n",
+    "The package offers 3 tools to scrape any website, get structured Google search results, and get structured Amazon search results respectively.\n",
+    "\n",
+    "To import them:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install langchain_scraperapi\n",
+    "\n",
+    "from langchain_scraperapi.tools import (\n",
+    "    ScraperAPIAmazonSearchTool,\n",
+    "    ScraperAPIGoogleSearchTool,\n",
+    "    ScraperAPITool,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Example use:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "tool = ScraperAPITool()\n",
+    "\n",
+    "result = tool.invoke({\"url\": \"https://example.com\", \"output_format\": \"markdown\"})\n",
+    "print(result)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "For a more detailed walkthrough of how to use these tools, visit the [official repository](https://github.com/scraperapi/langchain-scraperapi)."
+   ]
+  }
+ ],
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/docs/docs/integrations/tools/scraperapi.ipynb
+++ b/docs/docs/integrations/tools/scraperapi.ipynb
@@ -0,0 +1,329 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "d3a12ba8",
+   "metadata": {},
+   "source": [
+    "# LangChain – ScraperAPI\n",
+    "\n",
+    "Give your AI agent the ability to browse websites, search Google and Amazon in just two lines of code.\n",
+    "\n",
+    "The `langchain-scraperapi` package adds three ready-to-use LangChain tools backed by the [ScraperAPI](https://www.scraperapi.com/) service:\n",
+    "\n",
+    "| Tool class | Use it to |\n",
+    "|------------|------------------|\n",
+    "| `ScraperAPITool` | Grab the HTML/text/markdown of any web page |\n",
+    "| `ScraperAPIGoogleSearchTool` | Get structured Google Search SERP data |\n",
+    "| `ScraperAPIAmazonSearchTool` | Get structured Amazon product-search data |\n",
+    "\n",
+    "## Overview\n",
+    "\n",
+    "### Integration details\n",
+    "\n",
+    "| Package | Serializable | [JS support](https://js.langchain.com/docs/integrations/tools/__module_name__) |  Package latest |\n",
+    "| :--- | :---: | :---: | :---: |\n",
+    "| [langchain-scraperapi](https://pypi.org/project/langchain-scraperapi/) | ❌ | ❌ |  v0.1.1 |"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d1f7c70f",
+   "metadata": {},
+   "source": [
+    "### Setup\n",
+    "\n",
+    "Install the `langchain-scraperapi` package."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "494ecbc3",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install -U langchain-scraperapi"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c111d2fb",
+   "metadata": {},
+   "source": [
+    "### Credentials\n",
+    "\n",
+    "Create an account at https://www.scraperapi.com/ and get an API key."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4d315465",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "\n",
+    "os.environ[\"SCRAPERAPI_API_KEY\"] = \"your-api-key\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e06ffe48",
+   "metadata": {},
+   "source": [
+    "## Instantiation"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "27ae5612",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain_scraperapi.tools import ScraperAPITool\n",
+    "\n",
+    "tool = ScraperAPITool()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9ff46136",
+   "metadata": {},
+   "source": [
+    "## Invocation"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6e1a4c7f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "output = tool.invoke(\n",
+    "    {\n",
+    "        \"url\": \"https://langchain.com\",\n",
+    "        \"output_format\": \"markdown\",\n",
+    "        \"render\": True,\n",
+    "    }\n",
+    ")\n",
+    "print(output)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "051ef7b1",
+   "metadata": {},
+   "source": [
+    "## Features\n",
+    "\n",
+    "### 1. `ScraperAPITool` — browse any website\n",
+    "\n",
+    "Invoke the *raw* ScraperAPI endpoint and get HTML, rendered DOM, text, or markdown.\n",
+    "\n",
+    "**Invocation arguments**\n",
+    "\n",
+    "* **`url`** **(required)** – target page URL  \n",
+    "* **Optional (mirror ScraperAPI query params)**  \n",
+    "  * `output_format`: `\"text\"` | `\"markdown\"` (default returns raw HTML)  \n",
+    "  * `country_code`: e.g. `\"us\"`, `\"de\"`  \n",
+    "  * `device_type`: `\"desktop\"` | `\"mobile\"`  \n",
+    "  * `premium`: `bool` – use premium proxies  \n",
+    "  * `render`: `bool` – run JS before returning HTML  \n",
+    "  * `keep_headers`: `bool` – include response headers  \n",
+    "  \n",
+    "For the complete set of modifiers see the [ScraperAPI request-customisation docs](https://docs.scraperapi.com/python/making-requests/customizing-requests)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1a0c7cc2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain_scraperapi.tools import ScraperAPITool\n",
+    "\n",
+    "tool = ScraperAPITool()\n",
+    "\n",
+    "html_text = tool.invoke(\n",
+    "    {\n",
+    "        \"url\": \"https://langchain.com\",\n",
+    "        \"output_format\": \"markdown\",\n",
+    "        \"render\": True,\n",
+    "    }\n",
+    ")\n",
+    "print(html_text[:300], \"…\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9f2947dd",
+   "metadata": {},
+   "source": [
+    "### 2. `ScraperAPIGoogleSearchTool` — structured Google Search\n",
+    "\n",
+    "Structured SERP data via `/structured/google/search`.\n",
+    "\n",
+    "**Invocation arguments**\n",
+    "\n",
+    "* **`query`** **(required)** – natural-language search string  \n",
+    "* **Optional** — `country_code`, `tld`, `uule`, `hl`, `gl`, `ie`, `oe`, `start`, `num`  \n",
+    "* `output_format`: `\"json\"` (default) or `\"csv\"`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "aeac1195",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain_scraperapi.tools import ScraperAPIGoogleSearchTool\n",
+    "\n",
+    "google_search = ScraperAPIGoogleSearchTool()\n",
+    "\n",
+    "results = google_search.invoke(\n",
+    "    {\n",
+    "        \"query\": \"what is langchain\",\n",
+    "        \"num\": 20,\n",
+    "        \"output_format\": \"json\",\n",
+    "    }\n",
+    ")\n",
+    "print(results)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3dc2f845",
+   "metadata": {},
+   "source": [
+    "### 3. `ScraperAPIAmazonSearchTool` — structured Amazon Search\n",
+    "\n",
+    "Structured product results via `/structured/amazon/search`.\n",
+    "\n",
+    "**Invocation arguments**\n",
+    "\n",
+    "* **`query`** **(required)** – product search terms  \n",
+    "* **Optional** — `country_code`, `tld`, `page`  \n",
+    "* `output_format`: `\"json\"` (default) or `\"csv\"`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "05a4a6ed",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain_scraperapi.tools import ScraperAPIAmazonSearchTool\n",
+    "\n",
+    "amazon_search = ScraperAPIAmazonSearchTool()\n",
+    "\n",
+    "products = amazon_search.invoke(\n",
+    "    {\n",
+    "        \"query\": \"noise cancelling headphones\",\n",
+    "        \"tld\": \"co.uk\",\n",
+    "        \"page\": 2,\n",
+    "    }\n",
+    ")\n",
+    "print(products)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "607eb8c8",
+   "metadata": {},
+   "source": [
+    "## Use within an agent\n",
+    "\n",
+    "Here is an example of using the tools in an AI agent. The `ScraperAPITool` gives the AI the ability to browse any website, summarize articles, and click on links to navigate between pages."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6541b286",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install -U langchain-openai"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cb62e921",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "\n",
+    "from langchain.agents import AgentExecutor, create_tool_calling_agent\n",
+    "from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\n",
+    "from langchain_openai import ChatOpenAI\n",
+    "from langchain_scraperapi.tools import ScraperAPITool\n",
+    "\n",
+    "os.environ[\"SCRAPERAPI_API_KEY\"] = \"your-api-key\"\n",
+    "os.environ[\"OPENAI_API_KEY\"] = \"your-api-key\"\n",
+    "\n",
+    "tools = [ScraperAPITool(output_format=\"markdown\")]\n",
+    "llm = ChatOpenAI(model_name=\"gpt-4o\", temperature=0)\n",
+    "\n",
+    "prompt = ChatPromptTemplate.from_messages(\n",
+    "    [\n",
+    "        (\n",
+    "            \"system\",\n",
+    "            \"You are a helpful assistant that can browse websites for users. When asked to browse a website or a link, do so with the ScraperAPITool, then provide information based on the website based on the user's needs.\",\n",
+    "        ),\n",
+    "        (\"human\", \"{input}\"),\n",
+    "        MessagesPlaceholder(variable_name=\"agent_scratchpad\"),\n",
+    "    ]\n",
+    ")\n",
+    "\n",
+    "agent = create_tool_calling_agent(llm, tools, prompt)\n",
+    "agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)\n",
+    "response = agent_executor.invoke(\n",
+    "    {\"input\": \"can you browse hacker news and summarize the first website\"}\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4e90c894",
+   "metadata": {},
+   "source": [
+    "## API reference\n",
+    "\n",
+    "Below you can find more information on additional parameters to the tools to customize your requests.\n",
+    "\n",
+    "* [ScraperAPITool](https://docs.scraperapi.com/python/making-requests/customizing-requests)\n",
+    "* [ScraperAPIGoogleSearchTool](https://docs.scraperapi.com/python/make-requests-with-scraperapi-in-python/scraperapi-structured-data-collection-in-python/google-serp-api-structured-data-in-python)\n",
+    "* [ScraperAPIAmazonSearchTool](https://docs.scraperapi.com/python/make-requests-with-scraperapi-in-python/scraperapi-structured-data-collection-in-python/amazon-search-api-structured-data-in-python)\n",
+    "\n",
+    "The LangChain wrappers surface these parameters directly."
+   ]
+  }
+ ],
+ "metadata": {
+  "jupytext": {
+   "cell_metadata_filter": "-all",
+   "main_language": "python",
+   "notebook_metadata_filter": "-all"
+  },
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python",
+   "version": "3.10.2"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
--- a/libs/packages.yml
+++ b/libs/packages.yml
@@ -749,3 +749,6 @@ packages:
 - name: langchain-zeusdb
  repo: zeusdb/langchain-zeusdb
  path: libs/zeusdb
+- name: langchain-scraperapi
+  path: .
+  repo: scraperapi/langchain-scraperapi