mirror of
https://github.com/hwchase17/langchain.git
synced 2026-02-21 06:33:41 +00:00
Adds documentation for the integration langchain-scraperapi, which contains 3 tools using the ScraperAPI service. The tools give AI agents the ability to Scrape the web and return HTML/text/markdown Perform Google search and return json output Perform Amazon search and return json output For reference, here is the official repo for langchain_scraperapi: https://github.com/scraperapi/langchain-scraperapi
73 lines
2.0 KiB
Plaintext
73 lines
2.0 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# ScraperAPI\n",
|
|
"\n",
|
|
"[ScraperAPI](https://www.scraperapi.com/) enables data collection from any public website with its web scraping API, without worrying about proxies, browsers, or CAPTCHA handling. [langchain-scraperapi](https://github.com/scraperapi/langchain-scraperapi) wraps this service, making it easy for AI agents to browse the web and scrape data from it.\n",
|
|
"\n",
|
|
"## Installation and Setup\n",
|
|
"\n",
|
|
"- Install the Python package with `pip install langchain-scraperapi`.\n",
|
|
"- Obtain an API key from [ScraperAPI](https://www.scraperapi.com/) and set the environment variable `SCRAPERAPI_API_KEY`.\n",
|
|
"\n",
|
|
"### Tools\n",
|
|
"\n",
|
|
"The package offers 3 tools to scrape any website, get structured Google search results, and get structured Amazon search results respectively.\n",
|
|
"\n",
|
|
"To import them:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"%pip install langchain_scraperapi\n",
|
|
"\n",
|
|
"from langchain_scraperapi.tools import (\n",
|
|
" ScraperAPIAmazonSearchTool,\n",
|
|
" ScraperAPIGoogleSearchTool,\n",
|
|
" ScraperAPITool,\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Example use:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"tool = ScraperAPITool()\n",
|
|
"\n",
|
|
"result = tool.invoke({\"url\": \"https://example.com\", \"output_format\": \"markdown\"})\n",
|
|
"print(result)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"For a more detailed walkthrough of how to use these tools, visit the [official repository](https://github.com/scraperapi/langchain-scraperapi)."
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"language_info": {
|
|
"name": "python"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
}
|