mirror of
https://github.com/hwchase17/langchain.git
synced 2025-06-04 22:23:50 +00:00
Description This PR updates the docs for the [langchain-hyperbrowser](https://pypi.org/project/langchain-hyperbrowser/) package. It adds a few tools - Scrape Tool - Crawl Tool - Extract Tool - Browser Agents - Claude Computer Use - OpenAI CUA - Browser Use [Hyperbrowser](https://hyperbrowser.ai/) is a platform for running and scaling headless browsers. It lets you launch and manage browser sessions at scale and provides easy to use solutions for any webscraping needs, such as scraping a single page or crawling an entire site. Issue None Dependencies None Twitter Handle `@hyperbrowser`
176 lines
5.8 KiB
Plaintext
176 lines
5.8 KiB
Plaintext
# Hyperbrowser
|
|
|
|
> [Hyperbrowser](https://hyperbrowser.ai) is a platform for running and scaling headless browsers. It lets you launch and manage browser sessions at scale and provides easy to use solutions for any webscraping needs, such as scraping a single page or crawling an entire site.
|
|
>
|
|
> Key Features:
|
|
>
|
|
> - Instant Scalability - Spin up hundreds of browser sessions in seconds without infrastructure headaches
|
|
> - Simple Integration - Works seamlessly with popular tools like Puppeteer and Playwright
|
|
> - Powerful APIs - Easy to use APIs for scraping/crawling any site, and much more
|
|
> - Bypass Anti-Bot Measures - Built-in stealth mode, ad blocking, automatic CAPTCHA solving, and rotating proxies
|
|
|
|
For more information about Hyperbrowser, please visit the [Hyperbrowser website](https://hyperbrowser.ai) or if you want to check out the docs, you can visit the [Hyperbrowser docs](https://docs.hyperbrowser.ai).
|
|
|
|
## Installation and Setup
|
|
|
|
To get started with `langchain-hyperbrowser`, you can install the package using pip:
|
|
|
|
```bash
|
|
pip install langchain-hyperbrowser
|
|
```
|
|
|
|
And you should configure credentials by setting the following environment variables:
|
|
|
|
`HYPERBROWSER_API_KEY=<your-api-key>`
|
|
|
|
Make sure to get your API Key from https://app.hyperbrowser.ai/
|
|
|
|
## Available Tools
|
|
|
|
Hyperbrowser provides two main categories of tools that are particularly useful for:
|
|
- Web scraping and data extraction from complex websites
|
|
- Automating repetitive web tasks
|
|
- Interacting with web applications that require authentication
|
|
- Performing research across multiple websites
|
|
- Testing web applications
|
|
|
|
### Browser Agent Tools
|
|
|
|
Hyperbrowser provides a number of Browser Agents tools. Currently we supported
|
|
- Claude Computer Use
|
|
- OpenAI CUA
|
|
- Browser Use
|
|
|
|
You can see more details [here](/docs/integrations/tools/hyperbrowser_browser_agent_tools)
|
|
|
|
#### Browser Use Tool
|
|
A general-purpose browser automation tool that can handle various web tasks through natural language instructions.
|
|
|
|
```python
|
|
from langchain_hyperbrowser import HyperbrowserBrowserUseTool
|
|
|
|
tool = HyperbrowserBrowserUseTool()
|
|
result = tool.run({
|
|
"task": "Go to npmjs.com, find the React package, and tell me when it was last updated"
|
|
})
|
|
print(result)
|
|
```
|
|
|
|
#### OpenAI CUA Tool
|
|
Leverages OpenAI's Computer Use Agent capabilities for advanced web interactions and information gathering.
|
|
|
|
```python
|
|
from langchain_hyperbrowser import HyperbrowserOpenAICUATool
|
|
|
|
tool = HyperbrowserOpenAICUATool()
|
|
result = tool.run({
|
|
"task": "Go to Hacker News and summarize the top 5 posts right now"
|
|
})
|
|
print(result)
|
|
```
|
|
|
|
#### Claude Computer Use Tool
|
|
Utilizes Anthropic's Claude for sophisticated web browsing and information processing tasks.
|
|
|
|
```python
|
|
from langchain_hyperbrowser import HyperbrowserClaudeComputerUseTool
|
|
|
|
tool = HyperbrowserClaudeComputerUseTool()
|
|
result = tool.run({
|
|
"task": "Go to GitHub's trending repositories page, and list the top 3 posts there right now"
|
|
})
|
|
print(result)
|
|
```
|
|
|
|
### Web Scraping Tools
|
|
|
|
Here is a brief description of the Web Scraping Tools available with Hyperbrowser. You can see more details [here](/docs/integrations/tools/hyperbrowser_web_scraping_tools)
|
|
|
|
#### Scrape Tool
|
|
The Scrape Tool allows you to extract content from a single webpage in markdown, HTML, or link format.
|
|
|
|
```python
|
|
from langchain_hyperbrowser import HyperbrowserScrapeTool
|
|
|
|
tool = HyperbrowserScrapeTool()
|
|
result = tool.run({
|
|
"url": "https://example.com",
|
|
"scrape_options": {"formats": ["markdown"]}
|
|
})
|
|
print(result)
|
|
```
|
|
|
|
#### Crawl Tool
|
|
The Crawl Tool enables you to traverse entire websites, starting from a given URL, with configurable page limits.
|
|
|
|
```python
|
|
from langchain_hyperbrowser import HyperbrowserCrawlTool
|
|
|
|
tool = HyperbrowserCrawlTool()
|
|
result = tool.run({
|
|
"url": "https://example.com",
|
|
"max_pages": 2,
|
|
"scrape_options": {"formats": ["markdown"]}
|
|
})
|
|
print(result)
|
|
```
|
|
|
|
#### Extract Tool
|
|
The Extract Tool uses AI to pull structured data from web pages based on predefined schemas, making it perfect for data extraction tasks.
|
|
|
|
```python
|
|
from langchain_hyperbrowser import HyperbrowserExtractTool
|
|
from pydantic import BaseModel
|
|
|
|
class SimpleExtractionModel(BaseModel):
|
|
title: str
|
|
|
|
tool = HyperbrowserExtractTool()
|
|
result = tool.run({
|
|
"url": "https://example.com",
|
|
"schema": SimpleExtractionModel
|
|
})
|
|
print(result)
|
|
```
|
|
|
|
## Document Loader
|
|
|
|
The `HyperbrowserLoader` class in `langchain-hyperbrowser` can easily be used to load content from any single page or multiple pages as well as crawl an entire site.
|
|
The content can be loaded as markdown or html.
|
|
|
|
```python
|
|
from langchain_hyperbrowser import HyperbrowserLoader
|
|
|
|
loader = HyperbrowserLoader(urls="https://example.com")
|
|
docs = loader.load()
|
|
|
|
print(docs[0])
|
|
```
|
|
|
|
### Advanced Usage
|
|
|
|
You can specify the operation to be performed by the loader. The default operation is `scrape`. For `scrape`, you can provide a single URL or a list of URLs to be scraped. For `crawl`, you can only provide a single URL. The `crawl` operation will crawl the provided page and subpages and return a document for each page.
|
|
|
|
```python
|
|
loader = HyperbrowserLoader(
|
|
urls="https://hyperbrowser.ai", api_key="YOUR_API_KEY", operation="crawl"
|
|
)
|
|
```
|
|
|
|
Optional params for the loader can also be provided in the `params` argument. For more information on the supported params, visit https://docs.hyperbrowser.ai/reference/sdks/python/scrape#start-scrape-job-and-wait or https://docs.hyperbrowser.ai/reference/sdks/python/crawl#start-crawl-job-and-wait.
|
|
|
|
```python
|
|
loader = HyperbrowserLoader(
|
|
urls="https://example.com",
|
|
api_key="YOUR_API_KEY",
|
|
operation="scrape",
|
|
params={"scrape_options": {"include_tags": ["h1", "h2", "p"]}}
|
|
)
|
|
```
|
|
|
|
## Additional Resources
|
|
|
|
- [Hyperbrowser Docs](https://docs.hyperbrowser.ai/)
|
|
- [GitHub](https://github.com/hyperbrowserai/langchain-hyperbrowser/)
|
|
- [PyPi](https://pypi.org/project/langchain-hyperbrowser/)
|