docs: Add tools for hyperbrowser (#30606)

Description

This PR updates the docs for the
[langchain-hyperbrowser](https://pypi.org/project/langchain-hyperbrowser/)
package. It adds a few tools
 - Scrape Tool
 - Crawl Tool
 - Extract Tool
 - Browser Agents
   - Claude Computer Use
   - OpenAI CUA
   - Browser Use 

[Hyperbrowser](https://hyperbrowser.ai/) is a platform for running and
scaling headless browsers. It lets you launch and manage browser
sessions at scale and provides easy to use solutions for any webscraping
needs, such as scraping a single page or crawling an entire site.

Issue
None

Dependencies
None

Twitter Handle
`@hyperbrowser`
This commit is contained in:
Ninad Sinha 2025-04-04 15:02:47 -05:00 committed by GitHub
parent 6650b94627
commit a3671ceb71
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
4 changed files with 1070 additions and 1 deletions

View File

@ -25,6 +25,114 @@ And you should configure credentials by setting the following environment variab
Make sure to get your API Key from https://app.hyperbrowser.ai/ Make sure to get your API Key from https://app.hyperbrowser.ai/
## Available Tools
Hyperbrowser provides two main categories of tools that are particularly useful for:
- Web scraping and data extraction from complex websites
- Automating repetitive web tasks
- Interacting with web applications that require authentication
- Performing research across multiple websites
- Testing web applications
### Browser Agent Tools
Hyperbrowser provides a number of Browser Agents tools. Currently we supported
- Claude Computer Use
- OpenAI CUA
- Browser Use
You can see more details [here](/docs/integrations/tools/hyperbrowser_browser_agent_tools)
#### Browser Use Tool
A general-purpose browser automation tool that can handle various web tasks through natural language instructions.
```python
from langchain_hyperbrowser import HyperbrowserBrowserUseTool
tool = HyperbrowserBrowserUseTool()
result = tool.run({
"task": "Go to npmjs.com, find the React package, and tell me when it was last updated"
})
print(result)
```
#### OpenAI CUA Tool
Leverages OpenAI's Computer Use Agent capabilities for advanced web interactions and information gathering.
```python
from langchain_hyperbrowser import HyperbrowserOpenAICUATool
tool = HyperbrowserOpenAICUATool()
result = tool.run({
"task": "Go to Hacker News and summarize the top 5 posts right now"
})
print(result)
```
#### Claude Computer Use Tool
Utilizes Anthropic's Claude for sophisticated web browsing and information processing tasks.
```python
from langchain_hyperbrowser import HyperbrowserClaudeComputerUseTool
tool = HyperbrowserClaudeComputerUseTool()
result = tool.run({
"task": "Go to GitHub's trending repositories page, and list the top 3 posts there right now"
})
print(result)
```
### Web Scraping Tools
Here is a brief description of the Web Scraping Tools available with Hyperbrowser. You can see more details [here](/docs/integrations/tools/hyperbrowser_web_scraping_tools)
#### Scrape Tool
The Scrape Tool allows you to extract content from a single webpage in markdown, HTML, or link format.
```python
from langchain_hyperbrowser import HyperbrowserScrapeTool
tool = HyperbrowserScrapeTool()
result = tool.run({
"url": "https://example.com",
"scrape_options": {"formats": ["markdown"]}
})
print(result)
```
#### Crawl Tool
The Crawl Tool enables you to traverse entire websites, starting from a given URL, with configurable page limits.
```python
from langchain_hyperbrowser import HyperbrowserCrawlTool
tool = HyperbrowserCrawlTool()
result = tool.run({
"url": "https://example.com",
"max_pages": 2,
"scrape_options": {"formats": ["markdown"]}
})
print(result)
```
#### Extract Tool
The Extract Tool uses AI to pull structured data from web pages based on predefined schemas, making it perfect for data extraction tasks.
```python
from langchain_hyperbrowser import HyperbrowserExtractTool
from pydantic import BaseModel
class SimpleExtractionModel(BaseModel):
title: str
tool = HyperbrowserExtractTool()
result = tool.run({
"url": "https://example.com",
"schema": SimpleExtractionModel
})
print(result)
```
## Document Loader ## Document Loader
The `HyperbrowserLoader` class in `langchain-hyperbrowser` can easily be used to load content from any single page or multiple pages as well as crawl an entire site. The `HyperbrowserLoader` class in `langchain-hyperbrowser` can easily be used to load content from any single page or multiple pages as well as crawl an entire site.
@ -39,7 +147,7 @@ docs = loader.load()
print(docs[0]) print(docs[0])
``` ```
## Advanced Usage ### Advanced Usage
You can specify the operation to be performed by the loader. The default operation is `scrape`. For `scrape`, you can provide a single URL or a list of URLs to be scraped. For `crawl`, you can only provide a single URL. The `crawl` operation will crawl the provided page and subpages and return a document for each page. You can specify the operation to be performed by the loader. The default operation is `scrape`. For `scrape`, you can provide a single URL or a list of URLs to be scraped. For `crawl`, you can only provide a single URL. The `crawl` operation will crawl the provided page and subpages and return a document for each page.

View File

@ -0,0 +1,392 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"sidebar_label: Hyperbrowser Browser Agent Tools\n",
"---\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Hyperbrowser Browser Agent Tools\n",
"\n",
"[Hyperbrowser](https://hyperbrowser.ai) is a platform for running, running browser agents, and scaling headless browsers. It lets you launch and manage browser sessions at scale and provides easy to use solutions for any webscraping needs, such as scraping a single page or crawling an entire site.\n",
"\n",
"Key Features:\n",
"- Instant Scalability - Spin up hundreds of browser sessions in seconds without infrastructure headaches\n",
"- Simple Integration - Works seamlessly with popular tools like Puppeteer and Playwright\n",
"- Powerful APIs - Easy to use APIs for scraping/crawling any site, and much more\n",
"- Bypass Anti-Bot Measures - Built-in stealth mode, ad blocking, automatic CAPTCHA solving, and rotating proxies\n",
"\n",
"This notebook provides a quick overview for getting started with Hyperbrowser tools.\n",
"\n",
"For more information about Hyperbrowser, please visit the [Hyperbrowser website](https://hyperbrowser.ai) or if you want to check out the docs, you can visit the [Hyperbrowser docs](https://docs.hyperbrowser.ai).\n",
"\n",
"\n",
"## Browser Agents\n",
"\n",
"Hyperbrowser provides powerful browser agent tools that enable AI models to interact with web browsers programmatically. These browser agents can navigate websites, fill forms, click buttons, extract data, and perform complex web automation tasks.\n",
"\n",
"Browser agents are particularly useful for:\n",
"- Web scraping and data extraction from complex websites\n",
"- Automating repetitive web tasks\n",
"- Interacting with web applications that require authentication\n",
"- Performing research across multiple websites\n",
"- Testing web applications\n",
"\n",
"Hyperbrowser offers three types of browser agent tools:\n",
"- **Browser Use Tool**: A general-purpose browser automation tool\n",
"- **OpenAI CUA Tool**: Integration with OpenAI's Computer Use Agent\n",
"- **Claude Computer Use Tool**: Integration with Anthropic's Claude for computer use\n",
"\n",
"\n",
"## Overview\n",
"\n",
"### Integration details\n",
"\n",
"| Tool | Package | Local | Serializable | JS support |\n",
"| :----------------------- | :--------------------- | :---: | :----------: | :--------: |\n",
"| Browser Use Tool | langchain-hyperbrowser | ❌ | ❌ | ❌ |\n",
"| OpenAI CUA Tool | langchain-hyperbrowser | ❌ | ❌ | ❌ |\n",
"| Claude Computer Use Tool | langchain-hyperbrowser | ❌ | ❌ | ❌ |\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"To access the Hyperbrowser tools you'll need to install the `langchain-hyperbrowser` integration package, and create a Hyperbrowser account and get an API key.\n",
"\n",
"### Credentials\n",
"\n",
"Head to [Hyperbrowser](https://app.hyperbrowser.ai/) to sign up and generate an API key. Once you've done this set the HYPERBROWSER_API_KEY environment variable:\n",
"\n",
"```bash\n",
"export HYPERBROWSER_API_KEY=<your-api-key>\n",
"```\n",
"\n",
"### Installation\n",
"\n",
"Install **langchain-hyperbrowser**.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install -qU langchain-hyperbrowser"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Instantiation\n",
"\n",
"### Browser Use Tool\n",
"\n",
"The `HyperbrowserBrowserUseTool` is a tool to perform web automation tasks using a browser agent, specifically the Browser-Use agent.\n",
"\n",
"```python\n",
"from langchain_hyperbrowser import HyperbrowserBrowserUseTool\n",
"tool = HyperbrowserBrowserUseTool()\n",
"```\n",
"\n",
"### OpenAI CUA Tool\n",
"\n",
"The `HyperbrowserOpenAICUATool` is a specialized tool that leverages OpenAI's Computer Use Agent (CUA) capabilities through Hyperbrowser.\n",
"\n",
"```python\n",
"from langchain_hyperbrowser import HyperbrowserOpenAICUATool\n",
"tool = HyperbrowserOpenAICUATool()\n",
"```\n",
"\n",
"### Claude Computer Use Tool\n",
"\n",
"The `HyperbrowserClaudeComputerUseTool` is a specialized tool that leverages Claude's computer use capabilities through Hyperbrowser.\n",
"\n",
"```python\n",
"from langchain_hyperbrowser import HyperbrowserClaudeComputerUseTool\n",
"tool = HyperbrowserClaudeComputerUseTool()\n",
"```\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Invocation\n",
"\n",
"### Basic Usage\n",
"\n",
"#### Browser Use Tool\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'data': 'The top 5 posts on Hacker News right now are:\\n1. Stop Syncing Everything - https://sqlsync.dev/posts/stop-syncing-everything/\\n2. Move fast, break things: A review of Abundance by Ezra Klein and Derek Thompson - https://networked.substack.com/p/move-fast-and-break-things\\n3. DEDA Tracking Dots Extraction, Decoding and Anonymisation Toolkit - https://github.com/dfd-tud/deda\\n4. Electron band structure in germanium, my ass (2001) - https://pages.cs.wisc.edu/~kovar/hall.html\\n5. Show HN: I vibecoded a 35k LoC recipe app - https://www.recipeninja.ai', 'error': None}\n"
]
}
],
"source": [
"from langchain_hyperbrowser import HyperbrowserBrowserUseTool\n",
"\n",
"tool = HyperbrowserBrowserUseTool()\n",
"result = tool.run({\"task\": \"Go to Hacker News and summarize the top 5 posts right now\"})\n",
"print(result)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### OpenAI CUA Tool\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'data': 'Here are the titles of the top 5 posts on Hacker News right now:\\n\\n1. \"DEDA Tracking Dots Extraction, Decoding and Anonymisation Toolkit\"\\n2. \"A man powers home for eight years using a thousand old laptop batteries\"\\n3. \"Electron band structure in Germanium, my ass\"\\n4. \"Bletchley code breaker Betty Webb dies aged 101\"\\n5. \"Show HN: Zig Topological Sort Library for Parallel Processing\"', 'error': None}\n"
]
}
],
"source": [
"from langchain_hyperbrowser import HyperbrowserOpenAICUATool\n",
"\n",
"tool = HyperbrowserOpenAICUATool()\n",
"result = tool.run(\n",
" {\"task\": \"Go to Hacker News and get me the title of the top 5 posts right now\"}\n",
")\n",
"print(result)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Claude Computer Use Tool\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'data': \"Now I'll summarize the top 5 posts on Hacker News as of April 1, 2025:\\n\\n### Top 5 Hacker News Posts Summary\\n\\n1. **A man powers home for eight years using a thousand old laptop batteries** (techoreon.com)\\n - 267 points, posted 5 hours ago\\n - An innovative DIY project where someone managed to power their home using recycled laptop batteries for an extended period.\\n\\n2. **Electron band structure in germanium, my ass** (wisc.edu)\\n - 611 points, posted 8 hours ago\\n - Academic or technical discussion about electron band structure in germanium, possibly with a controversial or humorous take given the title.\\n\\n3. **Bletchley code breaker Betty Webb dies aged 101** (bbc.com)\\n - 575 points, posted 8 hours ago\\n - Obituary for Betty Webb, who worked as a code breaker at Bletchley Park during WWII, passing away at the age of 101.\\n\\n4. **Show HN: Zig Topological Sort Library for Parallel Processing** (github.com/williamw520)\\n - 55 points, posted 3 hours ago\\n - A developer sharing a library written in Zig programming language for topological sorting that supports parallel processing.\\n\\n5. **The Myst Graph: A New Perspective on Myst** (githr.com)\\n - 107 points, posted 5 hours ago\\n - An article presenting a new analysis or visualization of the classic video game Myst, likely using graph theory.\\n\\nThese are the top 5 posts currently trending on Hacker News as of April 1, 2025.\", 'error': None}\n"
]
}
],
"source": [
"from langchain_hyperbrowser import HyperbrowserClaudeComputerUseTool\n",
"\n",
"tool = HyperbrowserClaudeComputerUseTool()\n",
"result = tool.run({\"task\": \"Go to Hacker News and summarize the top 5 posts right now\"})\n",
"print(result)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### With Custom Session Options\n",
"\n",
"All tools support custom session options:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'data': 'I have found that the react package was last published 11 hours ago. This is the most recently updated package I could find.', 'error': None}\n"
]
}
],
"source": [
"result = tool.run(\n",
" {\n",
" \"task\": \"Go to npmjs.com, and tell me when react package was last updated.\",\n",
" \"session_options\": {\n",
" \"session_options\": {\"use_proxy\": True, \"accept_cookies\": True}\n",
" },\n",
" }\n",
")\n",
"print(result)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Async Usage\n",
"\n",
"All tools support async usage:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'data': 'The page displays information about the \"Example Domain,\" stating that it is used for illustrative purposes and can be utilized without permission. There\\'s a link to \"More information...\" but no specific contact details are provided.', 'error': None}\n"
]
}
],
"source": [
"async def browse_website():\n",
" tool = HyperbrowserBrowserUseTool()\n",
" result = await tool.arun(\n",
" {\n",
" \"task\": \"Go to npmjs.com, click the first visible package, and tell me when it was updated\"\n",
" }\n",
" )\n",
" return result\n",
"\n",
"\n",
"result = await browse_website()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Use within an agent\n",
"\n",
"Here's how to use any of the Hyperbrowser tools within an agent:\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"================================\u001b[1m Human Message \u001b[0m=================================\n",
"\n",
"Go to npmjs.com, and tell me when react package was last updated.\n",
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
"Tool Calls:\n",
" hyperbrowser_browser_use (call_pkAaDjn6kKH9yT3rHDb4hmET)\n",
" Call ID: call_pkAaDjn6kKH9yT3rHDb4hmET\n",
" Args:\n",
" task: Go to npmjs.com and find the last updated date of the React package.\n",
" session_options: None\n",
"=================================\u001b[1m Tool Message \u001b[0m=================================\n",
"Name: hyperbrowser_browser_use\n",
"\n",
"{\"data\": \"The last updated date of the React package is a day ago.\", \"error\": null}\n",
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
"\n",
"The React package was last updated a day ago.\n"
]
}
],
"source": [
"from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder\n",
"from langchain_hyperbrowser import browser_use_tool\n",
"from langchain_openai import ChatOpenAI\n",
"from langgraph.prebuilt import create_react_agent\n",
"\n",
"llm = ChatOpenAI(temperature=0)\n",
"\n",
"# You can use any of the three tools here\n",
"browser_use_tool = HyperbrowserBrowserUseTool()\n",
"agent = create_react_agent(llm, [browser_use_tool])\n",
"\n",
"user_input = \"Go to npmjs.com, and tell me when react package was last updated.\"\n",
"for step in agent.stream(\n",
" {\"messages\": user_input},\n",
" stream_mode=\"values\",\n",
"):\n",
" step[\"messages\"][-1].pretty_print()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Configuration Options\n",
"\n",
"Claude Computer Use, OpenAI CUA, and Browser Use have the following params available:\n",
"\n",
"- `task`: The task to execute using the agent\n",
"- `max_steps`: The maximum number of interaction steps the agent can take to complete the task\n",
"- `session_options`: Browser session configuration\n",
"\n",
"For more details, see the respective API references:\n",
"- [Browser Use API Reference](https://docs.hyperbrowser.ai/reference/api-reference/agents/browser-use)\n",
"- [OpenAI CUA API Reference](https://docs.hyperbrowser.ai/reference/api-reference/agents/openai-cua)\n",
"- [Claude Computer Use API Reference](https://docs.hyperbrowser.ai/reference/api-reference/agents/claude-computer-use)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"- [GitHub](https://github.com/hyperbrowserai/langchain-hyperbrowser/)\n",
"- [PyPi](https://pypi.org/project/langchain-hyperbrowser/)\n",
"- [Hyperbrowser Docs](https://docs.hyperbrowser.ai/)\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "langchain-hyperbrowser-AlekOQAq-py3.12",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@ -0,0 +1,559 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"sidebar_label: Hyperbrowser Web Scraping Tools\n",
"---\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Hyperbrowser Web Scraping Tools\n",
"\n",
"[Hyperbrowser](https://hyperbrowser.ai) is a platform for running and scaling headless browsers. It lets you launch and manage browser sessions at scale and provides easy to use solutions for any webscraping needs, such as scraping a single page or crawling an entire site.\n",
"\n",
"Key Features:\n",
"\n",
"- Instant Scalability - Spin up hundreds of browser sessions in seconds without infrastructure headaches\n",
"- Simple Integration - Works seamlessly with popular tools like Puppeteer and Playwright\n",
"- Powerful APIs - Easy to use APIs for scraping/crawling any site, and much more\n",
"- Bypass Anti-Bot Measures - Built-in stealth mode, ad blocking, automatic CAPTCHA solving, and rotating proxies\n",
"\n",
"This notebook provides a quick overview for getting started with Hyperbrowser web tools.\n",
"\n",
"For more information about Hyperbrowser, please visit the [Hyperbrowser website](https://hyperbrowser.ai) or if you want to check out the docs, you can visit the [Hyperbrowser docs](https://docs.hyperbrowser.ai).\n",
"\n",
"## Key Capabilities\n",
"\n",
"### Scrape\n",
"\n",
"Hyperbrowser provides powerful scraping capabilities that allow you to extract data from any webpage. The scraping tool can convert web content into structured formats like markdown or HTML, making it easy to process and analyze the data.\n",
"\n",
"### Crawl\n",
"\n",
"The crawling functionality enables you to navigate through multiple pages of a website automatically. You can set parameters like page limits to control how extensively the crawler explores the site, collecting data from each page it visits.\n",
"\n",
"### Extract\n",
"\n",
"Hyperbrowser's extraction capabilities use AI to pull specific information from webpages according to your defined schema. This allows you to transform unstructured web content into structured data that matches your exact requirements.\n",
"\n",
"## Overview\n",
"\n",
"### Integration details\n",
"\n",
"| Tool | Package | Local | Serializable | JS support |\n",
"| :----------- | :--------------------- | :---: | :----------: | :--------: |\n",
"| Crawl Tool | langchain-hyperbrowser | ❌ | ❌ | ❌ |\n",
"| Scrape Tool | langchain-hyperbrowser | ❌ | ❌ | ❌ |\n",
"| Extract Tool | langchain-hyperbrowser | ❌ | ❌ | ❌ |\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"To access the Hyperbrowser web tools you'll need to install the `langchain-hyperbrowser` integration package, and create a Hyperbrowser account and get an API key.\n",
"\n",
"### Credentials\n",
"\n",
"Head to [Hyperbrowser](https://app.hyperbrowser.ai/) to sign up and generate an API key. Once you've done this set the HYPERBROWSER_API_KEY environment variable:\n",
"\n",
"```bash\n",
"export HYPERBROWSER_API_KEY=<your-api-key>\n",
"```\n",
"\n",
"### Installation\n",
"\n",
"Install **langchain-hyperbrowser**.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install -qU langchain-hyperbrowser"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Instantiation\n",
"\n",
"### Crawl Tool\n",
"\n",
"The `HyperbrowserCrawlTool` is a powerful tool that can crawl entire websites, starting from a given URL. It supports configurable page limits and scraping options.\n",
"\n",
"```python\n",
"from langchain_hyperbrowser import HyperbrowserCrawlTool\n",
"tool = HyperbrowserCrawlTool()\n",
"```\n",
"\n",
"### Scrape Tool\n",
"\n",
"The `HyperbrowserScrapeTool` is a tool that can scrape content from web pages. It supports both markdown and HTML output formats, along with metadata extraction.\n",
"\n",
"```python\n",
"from langchain_hyperbrowser import HyperbrowserScrapeTool\n",
"tool = HyperbrowserScrapeTool()\n",
"```\n",
"\n",
"### Extract Tool\n",
"\n",
"The `HyperbrowserExtractTool` is a powerful tool that uses AI to extract structured data from web pages. It can extract information based predefined schemas.\n",
"\n",
"```python\n",
"from langchain_hyperbrowser import HyperbrowserExtractTool\n",
"tool = HyperbrowserExtractTool()\n",
"```\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Invocation\n",
"\n",
"### Basic Usage\n",
"\n",
"#### Crawl Tool\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'data': [CrawledPage(metadata={'url': 'https://www.example.com/', 'title': 'Example Domain', 'viewport': 'width=device-width, initial-scale=1', 'sourceURL': 'https://example.com'}, html=None, markdown='Example Domain\\n\\n# Example Domain\\n\\nThis domain is for use in illustrative examples in documents. You may use this\\ndomain in literature without prior coordination or asking for permission.\\n\\n[More information...](https://www.iana.org/domains/example)', links=None, screenshot=None, url='https://example.com', status='completed', error=None)], 'error': None}\n"
]
}
],
"source": [
"from langchain_hyperbrowser import HyperbrowserCrawlTool\n",
"\n",
"result = HyperbrowserCrawlTool().invoke(\n",
" {\n",
" \"url\": \"https://example.com\",\n",
" \"max_pages\": 2,\n",
" \"scrape_options\": {\"formats\": [\"markdown\"]},\n",
" }\n",
")\n",
"print(result)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Scrape Tool\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'data': ScrapeJobData(metadata={'url': 'https://www.example.com/', 'title': 'Example Domain', 'viewport': 'width=device-width, initial-scale=1', 'sourceURL': 'https://example.com'}, html=None, markdown='Example Domain\\n\\n# Example Domain\\n\\nThis domain is for use in illustrative examples in documents. You may use this\\ndomain in literature without prior coordination or asking for permission.\\n\\n[More information...](https://www.iana.org/domains/example)', links=None, screenshot=None), 'error': None}\n"
]
}
],
"source": [
"from langchain_hyperbrowser import HyperbrowserScrapeTool\n",
"\n",
"result = HyperbrowserScrapeTool().invoke(\n",
" {\"url\": \"https://example.com\", \"scrape_options\": {\"formats\": [\"markdown\"]}}\n",
")\n",
"print(result)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Extract Tool\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'data': {'title': 'Example Domain'}, 'error': None}\n"
]
}
],
"source": [
"from langchain_hyperbrowser import HyperbrowserExtractTool\n",
"from pydantic import BaseModel\n",
"\n",
"\n",
"class SimpleExtractionModel(BaseModel):\n",
" title: str\n",
"\n",
"\n",
"result = HyperbrowserExtractTool().invoke(\n",
" {\n",
" \"url\": \"https://example.com\",\n",
" \"schema\": SimpleExtractionModel,\n",
" }\n",
")\n",
"print(result)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### With Custom Options\n",
"\n",
"#### Crawl Tool with Custom Options\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'data': [CrawledPage(metadata={'url': 'https://www.example.com/', 'title': 'Example Domain', 'viewport': 'width=device-width, initial-scale=1', 'sourceURL': 'https://example.com'}, html=None, markdown='Example Domain\\n\\n# Example Domain\\n\\nThis domain is for use in illustrative examples in documents. You may use this\\ndomain in literature without prior coordination or asking for permission.\\n\\n[More information...](https://www.iana.org/domains/example)', links=None, screenshot=None, url='https://example.com', status='completed', error=None)], 'error': None}\n"
]
}
],
"source": [
"result = HyperbrowserCrawlTool().run(\n",
" {\n",
" \"url\": \"https://example.com\",\n",
" \"max_pages\": 2,\n",
" \"scrape_options\": {\n",
" \"formats\": [\"markdown\", \"html\"],\n",
" },\n",
" \"session_options\": {\"use_proxy\": True, \"solve_captchas\": True},\n",
" }\n",
")\n",
"print(result)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Scrape Tool with Custom Options\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'data': ScrapeJobData(metadata={'url': 'https://www.example.com/', 'title': 'Example Domain', 'viewport': 'width=device-width, initial-scale=1', 'sourceURL': 'https://example.com'}, html='<html><head>\\n <title>Example Domain</title>\\n\\n <meta charset=\"utf-8\">\\n <meta http-equiv=\"Content-type\" content=\"text/html; charset=utf-8\">\\n <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\n \\n</head>\\n\\n<body>\\n<div>\\n <h1>Example Domain</h1>\\n <p>This domain is for use in illustrative examples in documents. You may use this\\n domain in literature without prior coordination or asking for permission.</p>\\n <p><a href=\"https://www.iana.org/domains/example\">More information...</a></p>\\n</div>\\n\\n\\n</body></html>', markdown='Example Domain\\n\\n# Example Domain\\n\\nThis domain is for use in illustrative examples in documents. You may use this\\ndomain in literature without prior coordination or asking for permission.\\n\\n[More information...](https://www.iana.org/domains/example)', links=None, screenshot=None), 'error': None}\n"
]
}
],
"source": [
"result = HyperbrowserScrapeTool().run(\n",
" {\n",
" \"url\": \"https://example.com\",\n",
" \"scrape_options\": {\n",
" \"formats\": [\"markdown\", \"html\"],\n",
" },\n",
" \"session_options\": {\"use_proxy\": True, \"solve_captchas\": True},\n",
" }\n",
")\n",
"print(result)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Extract Tool with Custom Schema\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'data': {'products': [{'price': 9.99, 'title': 'Essence Mascara Lash Princess'}, {'price': 19.99, 'title': 'Eyeshadow Palette with Mirror'}, {'price': 14.99, 'title': 'Powder Canister'}, {'price': 12.99, 'title': 'Red Lipstick'}, {'price': 8.99, 'title': 'Red Nail Polish'}, {'price': 49.99, 'title': 'Calvin Klein CK One'}, {'price': 129.99, 'title': 'Chanel Coco Noir Eau De'}, {'price': 89.99, 'title': \"Dior J'adore\"}, {'price': 69.99, 'title': 'Dolce Shine Eau de'}, {'price': 79.99, 'title': 'Gucci Bloom Eau de'}]}, 'error': None}\n"
]
}
],
"source": [
"from typing import List\n",
"\n",
"from pydantic import BaseModel\n",
"\n",
"\n",
"class ProductSchema(BaseModel):\n",
" title: str\n",
" price: float\n",
"\n",
"\n",
"class ProductsSchema(BaseModel):\n",
" products: List[ProductSchema]\n",
"\n",
"\n",
"result = HyperbrowserExtractTool().run(\n",
" {\n",
" \"url\": \"https://dummyjson.com/products?limit=10\",\n",
" \"schema\": ProductsSchema,\n",
" \"session_options\": {\"session_options\": {\"use_proxy\": True}},\n",
" }\n",
")\n",
"print(result)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Async Usage\n",
"\n",
"All tools support async usage:\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"ename": "NameError",
"evalue": "name 'List' is not defined",
"output_type": "error",
"traceback": [
"\u001b[31m---------------------------------------------------------------------------\u001b[39m",
"\u001b[31mNameError\u001b[39m Traceback (most recent call last)",
"\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[6]\u001b[39m\u001b[32m, line 10\u001b[39m\n\u001b[32m 1\u001b[39m \u001b[38;5;28;01mfrom\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mlangchain_hyperbrowser\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01mimport\u001b[39;00m (\n\u001b[32m 2\u001b[39m HyperbrowserCrawlTool,\n\u001b[32m 3\u001b[39m HyperbrowserExtractTool,\n\u001b[32m 4\u001b[39m HyperbrowserScrapeTool,\n\u001b[32m 5\u001b[39m )\n\u001b[32m 7\u001b[39m \u001b[38;5;28;01mfrom\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mpydantic\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01mimport\u001b[39;00m BaseModel\n\u001b[32m---> \u001b[39m\u001b[32m10\u001b[39m \u001b[38;5;28;43;01mclass\u001b[39;49;00m\u001b[38;5;250;43m \u001b[39;49m\u001b[34;43;01mExtractionSchema\u001b[39;49;00m\u001b[43m(\u001b[49m\u001b[43mBaseModel\u001b[49m\u001b[43m)\u001b[49m\u001b[43m:\u001b[49m\n\u001b[32m 11\u001b[39m \u001b[43m \u001b[49m\u001b[43mpopular_library_name\u001b[49m\u001b[43m:\u001b[49m\u001b[43m \u001b[49m\u001b[43mList\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;28;43mstr\u001b[39;49m\u001b[43m]\u001b[49m\n\u001b[32m 14\u001b[39m \u001b[38;5;28;01masync\u001b[39;00m \u001b[38;5;28;01mdef\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34mweb_operations\u001b[39m():\n\u001b[32m 15\u001b[39m \u001b[38;5;66;03m# Crawl\u001b[39;00m\n",
"\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[6]\u001b[39m\u001b[32m, line 11\u001b[39m, in \u001b[36mExtractionSchema\u001b[39m\u001b[34m()\u001b[39m\n\u001b[32m 10\u001b[39m \u001b[38;5;28;01mclass\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mExtractionSchema\u001b[39;00m(BaseModel):\n\u001b[32m---> \u001b[39m\u001b[32m11\u001b[39m popular_library_name: \u001b[43mList\u001b[49m[\u001b[38;5;28mstr\u001b[39m]\n",
"\u001b[31mNameError\u001b[39m: name 'List' is not defined"
]
}
],
"source": [
"from typing import List\n",
"\n",
"from langchain_hyperbrowser import (\n",
" HyperbrowserCrawlTool,\n",
" HyperbrowserExtractTool,\n",
" HyperbrowserScrapeTool,\n",
")\n",
"from pydantic import BaseModel\n",
"\n",
"\n",
"class ExtractionSchema(BaseModel):\n",
" popular_library_name: List[str]\n",
"\n",
"\n",
"async def web_operations():\n",
" # Crawl\n",
" crawl_tool = HyperbrowserCrawlTool()\n",
" crawl_result = await crawl_tool.arun(\n",
" {\n",
" \"url\": \"https://example.com\",\n",
" \"max_pages\": 5,\n",
" \"scrape_options\": {\"formats\": [\"markdown\"]},\n",
" }\n",
" )\n",
"\n",
" # Scrape\n",
" scrape_tool = HyperbrowserScrapeTool()\n",
" scrape_result = await scrape_tool.arun(\n",
" {\"url\": \"https://example.com\", \"scrape_options\": {\"formats\": [\"markdown\"]}}\n",
" )\n",
"\n",
" # Extract\n",
" extract_tool = HyperbrowserExtractTool()\n",
" extract_result = await extract_tool.arun(\n",
" {\n",
" \"url\": \"https://npmjs.com\",\n",
" \"schema\": ExtractionSchema,\n",
" }\n",
" )\n",
"\n",
" return crawl_result, scrape_result, extract_result\n",
"\n",
"\n",
"results = await web_operations()\n",
"print(results)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Use within an agent\n",
"\n",
"Here's how to use any of the web tools within an agent:\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"================================\u001b[1m Human Message \u001b[0m=================================\n",
"\n",
"Crawl https://example.com and get content from up to 5 pages\n",
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
"Tool Calls:\n",
" hyperbrowser_crawl_data (call_G2ofdHOqjdnJUZu4hhbuga58)\n",
" Call ID: call_G2ofdHOqjdnJUZu4hhbuga58\n",
" Args:\n",
" url: https://example.com\n",
" max_pages: 5\n",
" scrape_options: {'formats': ['markdown']}\n",
"=================================\u001b[1m Tool Message \u001b[0m=================================\n",
"Name: hyperbrowser_crawl_data\n",
"\n",
"{'data': [CrawledPage(metadata={'url': 'https://www.example.com/', 'title': 'Example Domain', 'viewport': 'width=device-width, initial-scale=1', 'sourceURL': 'https://example.com'}, html=None, markdown='Example Domain\\n\\n# Example Domain\\n\\nThis domain is for use in illustrative examples in documents. You may use this\\ndomain in literature without prior coordination or asking for permission.\\n\\n[More information...](https://www.iana.org/domains/example)', links=None, screenshot=None, url='https://example.com', status='completed', error=None)], 'error': None}\n",
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
"\n",
"I have crawled the website [https://example.com](https://example.com) and retrieved content from the first page. Here is the content in markdown format:\n",
"\n",
"```\n",
"Example Domain\n",
"\n",
"# Example Domain\n",
"\n",
"This domain is for use in illustrative examples in documents. You may use this\n",
"domain in literature without prior coordination or asking for permission.\n",
"\n",
"[More information...](https://www.iana.org/domains/example)\n",
"```\n",
"\n",
"If you would like to crawl more pages or need additional information, please let me know!\n"
]
}
],
"source": [
"from langchain_hyperbrowser import HyperbrowserCrawlTool\n",
"from langchain_openai import ChatOpenAI\n",
"from langgraph.prebuilt import create_react_agent\n",
"\n",
"# Initialize the crawl tool\n",
"crawl_tool = HyperbrowserCrawlTool()\n",
"\n",
"# Create the agent with the crawl tool\n",
"llm = ChatOpenAI(temperature=0)\n",
"\n",
"agent = create_react_agent(llm, [crawl_tool])\n",
"user_input = \"Crawl https://example.com and get content from up to 5 pages\"\n",
"for step in agent.stream(\n",
" {\"messages\": user_input},\n",
" stream_mode=\"values\",\n",
"):\n",
" step[\"messages\"][-1].pretty_print()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Configuration Options\n",
"\n",
"### Common Options\n",
"\n",
"All tools support these basic configuration options:\n",
"\n",
"- `url`: The URL to process\n",
"- `session_options`: Browser session configuration\n",
" - `use_proxy`: Whether to use a proxy\n",
" - `solve_captchas`: Whether to automatically solve CAPTCHAs\n",
" - `accept_cookies`: Whether to accept cookies\n",
"\n",
"### Tool-Specific Options\n",
"\n",
"#### Crawl Tool\n",
"\n",
"- `max_pages`: Maximum number of pages to crawl\n",
"- `scrape_options`: Options for scraping each page\n",
" - `formats`: List of output formats (markdown, html)\n",
"\n",
"#### Scrape Tool\n",
"\n",
"- `scrape_options`: Options for scraping the page\n",
" - `formats`: List of output formats (markdown, html)\n",
"\n",
"#### Extract Tool\n",
"\n",
"- `schema`: Pydantic model defining the structure to extract\n",
"- `extraction_prompt`: Natural language prompt for extraction\n",
"\n",
"For more details, see the respective API references:\n",
"\n",
"- [Crawl API Reference](https://docs.hyperbrowser.ai/reference/api-reference/crawl)\n",
"- [Scrape API Reference](https://docs.hyperbrowser.ai/reference/api-reference/scrape)\n",
"- [Extract API Reference](https://docs.hyperbrowser.ai/reference/api-reference/extract)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"- [GitHub](https://github.com/hyperbrowserai/langchain-hyperbrowser/)\n",
"- [PyPi](https://pypi.org/project/langchain-hyperbrowser/)\n",
"- [Hyperbrowser Docs](https://docs.hyperbrowser.ai/)\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "langchain-hyperbrowser-AlekOQAq-py3.12",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@ -152,6 +152,16 @@ WEBBROWSING_TOOL_FEAT_TABLE = {
"interactions": True, "interactions": True,
"pricing": "Free trial, with pay-as-you-go and flat rate plans after", "pricing": "Free trial, with pay-as-you-go and flat rate plans after",
}, },
"Hyperbrowser Browser Agent Tools": {
"link": "/docs/integrations/tools/hyperbrowser_browser_agent_tools",
"interactions": True,
"pricing": "Free trial, with flat rate plans and pre-paid credits after",
},
"Hyperbrowser Web Scraping Tools": {
"link": "/docs/integrations/tools/hyperbrowser_web_scraping_tools",
"interactions": False,
"pricing": "Free trial, with flat rate plans and pre-paid credits after",
},
} }
DATABASE_TOOL_FEAT_TABLE = { DATABASE_TOOL_FEAT_TABLE = {