docs: Add Brightdata integration documentation (#31114)

Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
  - Example: "core: add foobar LLM"

- **Description:** Integrated the Bright Data package to enable
Langchain users to seamlessly incorporate Bright Data into their agents.
 - **Dependencies:** None
- **LinkedIn handle**:[Bright
Data](https://www.linkedin.com/company/bright-data)

- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
This commit is contained in:
meirk-brd 2025-05-11 19:07:21 +03:00 committed by GitHub
parent 0d59fe9789
commit e6147ce5d2
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
5 changed files with 937 additions and 0 deletions

View File

@ -0,0 +1,34 @@
# Bright Data
[Bright Data](https://brightdata.com) is a web data platform that provides tools for web scraping, SERP collection, and accessing geo-restricted content.
Bright Data allows developers to extract structured data from websites, perform search engine queries, and access content that might be otherwise blocked or geo-restricted. The platform is designed to help overcome common web scraping challenges including anti-bot systems, CAPTCHAs, and IP blocks.
## Installation and Setup
```bash
pip install langchain-brightdata
```
You'll need to set up your Bright Data API key:
```python
import os
os.environ["BRIGHT_DATA_API_KEY"] = "your-api-key"
```
Or you can pass it directly when initializing tools:
```python
from langchain_bright_data import BrightDataSERP
tool = BrightDataSERP(bright_data_api_key="your-api-key")
```
## Tools
The Bright Data integration provides several tools:
- [BrightDataSERP](/docs/integrations/tools/brightdata_serp) - Search engine results collection with geo-targeting
- [BrightDataUnblocker](/docs/integrations/tools/brightdata_unlocker) - Access ANY public website that might be geo-restricted or bot-protected
- [BrightDataWebScraperAPI](/docs/integrations/tools/brightdata-webscraperapi) - Extract structured data from 100+ ppoular domains, e.g. Amazon product details and LinkedIn profiles

View File

@ -0,0 +1,292 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# BrightDataWebScraperAPI"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[Bright Data](https://brightdata.com/) provides a powerful Web Scraper API that allows you to extract structured data from 100+ ppular domains, including Amazon product details, LinkedIn profiles, and more, making it particularly useful for AI agents requiring reliable structured web data feeds."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Overview"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Integration details\n",
"\n",
"|Class|Package|Serializable|JS support|Package latest|\n",
"|:--|:--|:-:|:-:|:-:|\n",
"|[BrightDataWebScraperAPI](https://pypi.org/project/langchain-brightdata/)|[langchain-brightdata](https://pypi.org/project/langchain-brightdata/)|✅|❌|![PyPI - Version](https://img.shields.io/pypi/v/langchain-brightdata?style=flat-square&label=%20)|\n",
"\n",
"### Tool features\n",
"\n",
"|Native async|Returns artifact|Return data|Pricing|\n",
"|:-:|:-:|:--|:-:|\n",
"|❌|❌|Structured data from websites (Amazon products, LinkedIn profiles, etc.)|Requires Bright Data account|\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"The integration lives in the `langchain-brightdata` package.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"vscode": {
"languageId": "plaintext"
}
},
"outputs": [],
"source": [
"pip install langchain-brightdata"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You'll need a Bright Data API key to use this tool. You can set it as an environment variable:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"vscode": {
"languageId": "plaintext"
}
},
"outputs": [],
"source": [
"import os\n",
"\n",
"os.environ[\"BRIGHT_DATA_API_KEY\"] = \"your-api-key\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Or pass it directly when initializing the tool:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"vscode": {
"languageId": "plaintext"
}
},
"outputs": [],
"source": [
"from langchain_brightdata import BrightDataWebScraperAPI\n",
"\n",
"scraper_tool = BrightDataWebScraperAPI(bright_data_api_key=\"your-api-key\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Instantiation\n",
"\n",
"Here we show how to instantiate an instance of the BrightDataWebScraperAPI tool. This tool allows you to extract structured data from various websites including Amazon product details, LinkedIn profiles, and more using Bright Data's Dataset API.\n",
"\n",
"The tool accepts various parameters during instantiation:\n",
"\n",
"- `bright_data_api_key` (required, str): Your Bright Data API key for authentication.\n",
"- `dataset_mapping` (optional, Dict[str, str]): A dictionary mapping dataset types to their corresponding Bright Data dataset IDs. The default mapping includes:\n",
" - \"amazon_product\": \"gd_l7q7dkf244hwjntr0\"\n",
" - \"amazon_product_reviews\": \"gd_le8e811kzy4ggddlq\"\n",
" - \"linkedin_person_profile\": \"gd_l1viktl72bvl7bjuj0\"\n",
" - \"linkedin_company_profile\": \"gd_l1vikfnt1wgvvqz95w\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Invocation\n",
"\n",
"### Basic Usage"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"vscode": {
"languageId": "plaintext"
}
},
"outputs": [],
"source": [
"from langchain_brightdata import BrightDataWebScraperAPI\n",
"\n",
"# Initialize the tool\n",
"scraper_tool = BrightDataWebScraperAPI(\n",
" bright_data_api_key=\"your-api-key\" # Optional if set in environment variables\n",
")\n",
"\n",
"# Extract Amazon product data\n",
"results = scraper_tool.invoke(\n",
" {\"url\": \"https://www.amazon.com/dp/B08L5TNJHG\", \"dataset_type\": \"amazon_product\"}\n",
")\n",
"\n",
"print(results)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Advanced Usage with Parameters"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"vscode": {
"languageId": "plaintext"
}
},
"outputs": [],
"source": [
"from langchain_brightdata import BrightDataWebScraperAPI\n",
"\n",
"# Initialize with default parameters\n",
"scraper_tool = BrightDataWebScraperAPI(bright_data_api_key=\"your-api-key\")\n",
"\n",
"# Extract Amazon product data with location-specific pricing\n",
"results = scraper_tool.invoke(\n",
" {\n",
" \"url\": \"https://www.amazon.com/dp/B08L5TNJHG\",\n",
" \"dataset_type\": \"amazon_product\",\n",
" \"zipcode\": \"10001\", # Get pricing for New York City\n",
" }\n",
")\n",
"\n",
"print(results)\n",
"\n",
"# Extract LinkedIn profile data\n",
"linkedin_results = scraper_tool.invoke(\n",
" {\n",
" \"url\": \"https://www.linkedin.com/in/satyanadella/\",\n",
" \"dataset_type\": \"linkedin_person_profile\",\n",
" }\n",
")\n",
"\n",
"print(linkedin_results)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Customization Options\n",
"\n",
"The BrightDataWebScraperAPI tool accepts several parameters for customization:\n",
"\n",
"|Parameter|Type|Description|\n",
"|:--|:--|:--|\n",
"|`url`|str|The URL to extract data from|\n",
"|`dataset_type`|str|Type of dataset to use (e.g., \"amazon_product\")|\n",
"|`zipcode`|str|Optional zipcode for location-specific data|\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Available Dataset Types\n",
"\n",
"The tool supports the following dataset types for structured data extraction:\n",
"\n",
"|Dataset Type|Description|\n",
"|:--|:--|\n",
"|`amazon_product`|Extract detailed Amazon product data|\n",
"|`amazon_product_reviews`|Extract Amazon product reviews|\n",
"|`linkedin_person_profile`|Extract LinkedIn person profile data|\n",
"|`linkedin_company_profile`|Extract LinkedIn company profile data|\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Use within an agent"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"vscode": {
"languageId": "plaintext"
}
},
"outputs": [],
"source": [
"from langchain_brightdata import BrightDataWebScraperAPI\n",
"from langchain_google_genai import ChatGoogleGenerativeAI\n",
"from langgraph.prebuilt import create_react_agent\n",
"\n",
"# Initialize the LLM\n",
"llm = ChatGoogleGenerativeAI(model=\"gemini-2.0-flash\", google_api_key=\"your-api-key\")\n",
"\n",
"# Initialize the Bright Data Web Scraper API tool\n",
"scraper_tool = BrightDataWebScraperAPI(bright_data_api_key=\"your-api-key\")\n",
"\n",
"# Create the agent with the tool\n",
"agent = create_react_agent(llm, [scraper_tool])\n",
"\n",
"# Provide a user query\n",
"user_input = \"Scrape Amazon product data for https://www.amazon.com/dp/B0D2Q9397Y?th=1 in New York (zipcode 10001).\"\n",
"\n",
"# Stream the agent's step-by-step output\n",
"for step in agent.stream(\n",
" {\"messages\": user_input},\n",
" stream_mode=\"values\",\n",
"):\n",
" step[\"messages\"][-1].pretty_print()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"- [Bright Data API Documentation](https://docs.brightdata.com/scraping-automation/web-scraper-api/overview)"
]
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@ -0,0 +1,294 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "a6f91f20",
"metadata": {},
"source": [
"# BrightDataSERP\n",
"\n",
"[Bright Data](https://brightdata.com/) provides a powerful SERP API that allows you to query search engines (Google,Bing.DuckDuckGo,Yandex) with geo-targeting and advanced customization options, particularly useful for AI agents requiring real-time web information.\n",
"\n",
"\n",
"## Overview\n",
"\n",
"### Integration details\n",
"\n",
"\n",
"|Class|Package|Serializable|JS support|Package latest|\n",
"|:--|:--|:-:|:-:|:-:|\n",
"|[BrightDataSERP](https://pypi.org/project/langchain-brightdata/)|[langchain-brightdata](https://pypi.org/project/langchain-brightdata/)|✅|❌|![PyPI - Version](https://img.shields.io/pypi/v/langchain-brightdata?style=flat-square&label=%20)|\n",
"\n",
"\n",
"### Tool features\n",
"\n",
"\n",
"|Native async|Returns artifact|Return data|Pricing|\n",
"|:-:|:-:|:--|:-:|\n",
"|❌|❌|Title, URL, snippet, position, and other search result data|Requires Bright Data account|\n",
"\n",
"\n",
"\n",
"## Setup\n",
"\n",
"The integration lives in the `langchain-brightdata` package."
]
},
{
"cell_type": "raw",
"id": "f85b4089",
"metadata": {},
"source": [
"pip install langchain-brightdata"
]
},
{
"cell_type": "markdown",
"id": "b15e9266",
"metadata": {},
"source": [
"### Credentials\n",
"\n",
"You'll need a Bright Data API key to use this tool. You can set it as an environment variable:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "e0b178a2-8816-40ca-b57c-ccdd86dde9c9",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"os.environ[\"BRIGHT_DATA_API_KEY\"] = \"your-api-key\""
]
},
{
"cell_type": "markdown",
"id": "bc5ab717-fd27-4c59-b912-bdd099541478",
"metadata": {},
"source": [
"Or pass it directly when initializing the tool:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "a6c2f136-6367-4f1f-825d-ae741e1bf281",
"metadata": {},
"outputs": [],
"source": [
"from langchain_brightdata import BrightDataSERP\n",
"\n",
"serp_tool = BrightDataSERP(bright_data_api_key=\"your-api-key\")"
]
},
{
"cell_type": "markdown",
"id": "eed8cfcc",
"metadata": {},
"source": [
"## Instantiation\n",
"\n",
"Here we show how to instantiate an instance of the BrightDataSERP tool. This tool allows you to perform search engine queries with various customization options including geo-targeting, language preferences, device type simulation, and specific search types using Bright Data's SERP API.\n",
"\n",
"The tool accepts various parameters during instantiation:\n",
"\n",
"- `bright_data_api_key` (required, str): Your Bright Data API key for authentication.\n",
"- `search_engine` (optional, str): Search engine to use for queries. Default is \"google\". Other options include \"bing\", \"yahoo\", \"yandex\", \"DuckDuckGo\" etc.\n",
"- `country` (optional, str): Two-letter country code for localized search results (e.g., \"us\", \"gb\", \"de\", \"jp\"). Default is \"us\".\n",
"- `language` (optional, str): Two-letter language code for the search results (e.g., \"en\", \"es\", \"fr\", \"de\"). Default is \"en\".\n",
"- `results_count` (optional, int): Number of search results to return. Default is 10. Maximum value is typically 100.\n",
"- `search_type` (optional, str): Type of search to perform. Options include:\n",
" - None (default): Regular web search\n",
" - \"isch\": Images search\n",
" - \"shop\": Shopping search\n",
" - \"nws\": News search\n",
" - \"jobs\": Jobs search\n",
"- `device_type` (optional, str): Device type to simulate for the search. Options include:\n",
" - None (default): Desktop device\n",
" - \"mobile\": Generic mobile device\n",
" - \"ios\": iOS device (iPhone)\n",
" - \"android\": Android device\n",
"- `parse_results` (optional, bool): Whether to return parsed JSON results. Default is False, which returns raw HTML response."
]
},
{
"cell_type": "markdown",
"id": "1c97218f-f366-479d-8bf7-fe9f2f6df73f",
"metadata": {},
"source": [
"## Invocation"
]
},
{
"cell_type": "markdown",
"id": "902dc1fd",
"metadata": {},
"source": [
"### Basic Usage"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8b3ddfe9-ca79-494c-a7ab-1f56d9407a64",
"metadata": {},
"outputs": [],
"source": [
"from langchain_brightdata import BrightDataSERP\n",
"\n",
"# Initialize the tool\n",
"serp_tool = BrightDataSERP(\n",
" bright_data_api_key=\"your-api-key\" # Optional if set in environment variables\n",
")\n",
"\n",
"# Run a basic search\n",
"results = serp_tool.invoke(\"latest AI research papers\")\n",
"\n",
"print(results)"
]
},
{
"cell_type": "markdown",
"id": "74147a1a",
"metadata": {},
"source": [
"### Advanced Usage with Parameters"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "65310a8b-eb0c-4d9e-a618-4f4abe2414fc",
"metadata": {},
"outputs": [],
"source": [
"from langchain_brightdata import BrightDataSERP\n",
"\n",
"# Initialize with default parameters\n",
"serp_tool = BrightDataSERP(\n",
" bright_data_api_key=\"your-api-key\",\n",
" search_engine=\"google\", # Default\n",
" country=\"us\", # Default\n",
" language=\"en\", # Default\n",
" results_count=10, # Default\n",
" parse_results=True, # Get structured JSON results\n",
")\n",
"\n",
"# Use with specific parameters for this search\n",
"results = serp_tool.invoke(\n",
" {\n",
" \"query\": \"best electric vehicles\",\n",
" \"country\": \"de\", # Get results as if searching from Germany\n",
" \"language\": \"de\", # Get results in German\n",
" \"search_type\": \"shop\", # Get shopping results\n",
" \"device_type\": \"mobile\", # Simulate a mobile device\n",
" \"results_count\": 15,\n",
" }\n",
")\n",
"\n",
"print(results)"
]
},
{
"cell_type": "markdown",
"id": "d6e73897",
"metadata": {},
"source": [
"## Customization Options\n",
"\n",
"The BrightDataSERP tool accepts several parameters for customization:\n",
"\n",
"|Parameter|Type|Description|\n",
"|:--|:--|:--|\n",
"|`query`|str|The search query to perform|\n",
"|`search_engine`|str|Search engine to use (default: \"google\")|\n",
"|`country`|str|Two-letter country code for localized results (default: \"us\")|\n",
"|`language`|str|Two-letter language code (default: \"en\")|\n",
"|`results_count`|int|Number of results to return (default: 10)|\n",
"|`search_type`|str|Type of search: None (web), \"isch\" (images), \"shop\", \"nws\" (news), \"jobs\"|\n",
"|`device_type`|str|Device type: None (desktop), \"mobile\", \"ios\", \"android\"|\n",
"|`parse_results`|bool|Whether to return structured JSON (default: False)|\n"
]
},
{
"cell_type": "markdown",
"id": "e3353ce6",
"metadata": {},
"source": [
"## Use within an agent"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8c91c32f",
"metadata": {},
"outputs": [],
"source": [
"from langchain_brightdata import BrightDataSERP\n",
"from langchain_google_genai import ChatGoogleGenerativeAI\n",
"from langgraph.prebuilt import create_react_agent\n",
"\n",
"# Initialize the LLM\n",
"llm = ChatGoogleGenerativeAI(model=\"gemini-2.0-flash\", google_api_key=\"your-api-key\")\n",
"\n",
"# Initialize the Bright Data SERP tool\n",
"serp_tool = BrightDataSERP(\n",
" bright_data_api_key=\"your-api-key\",\n",
" search_engine=\"google\",\n",
" country=\"us\",\n",
" language=\"en\",\n",
" results_count=10,\n",
" parse_results=True,\n",
")\n",
"\n",
"# Create the agent\n",
"agent = create_react_agent(llm, [serp_tool])\n",
"\n",
"# Provide a user query\n",
"user_input = \"Search for 'best electric vehicles' shopping results in Germany in German using mobile.\"\n",
"\n",
"# Stream the agent's output step-by-step\n",
"for step in agent.stream(\n",
" {\"messages\": user_input},\n",
" stream_mode=\"values\",\n",
"):\n",
" step[\"messages\"][-1].pretty_print()"
]
},
{
"cell_type": "markdown",
"id": "e8dec55a",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"- [Bright Data API Documentation](https://docs.brightdata.com/scraping-automation/serp-api/introduction)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.16"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@ -0,0 +1,314 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# BrightDataUnlocker"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[Bright Data](https://brightdata.com/) provides a powerful Web Unlocker API that allows you to access websites that might be protected by anti-bot measures, geo-restrictions, or other access limitations, making it particularly useful for AI agents requiring reliable web content extraction."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Overview"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Integration details"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"|Class|Package|Serializable|JS support|Package latest|\n",
"|:--|:--|:-:|:-:|:-:|\n",
"|[BrightDataUnlocker](https://pypi.org/project/langchain-brightdata/)|[langchain-brightdata](https://pypi.org/project/langchain-brightdata/)|✅|❌|![PyPI - Version](https://img.shields.io/pypi/v/langchain-brightdata?style=flat-square&label=%20)|\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Tool features"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"|Native async|Returns artifact|Return data|Pricing|\n",
"|:-:|:-:|:--|:-:|\n",
"|❌|❌|HTML, Markdown, or screenshot of web pages|Requires Bright Data account|\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The integration lives in the `langchain-brightdata` package."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"vscode": {
"languageId": "plaintext"
}
},
"outputs": [],
"source": [
"pip install langchain-brightdata"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You'll need a Bright Data API key to use this tool. You can set it as an environment variable:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"vscode": {
"languageId": "plaintext"
}
},
"outputs": [],
"source": [
"import os\n",
"\n",
"os.environ[\"BRIGHT_DATA_API_KEY\"] = \"your-api-key\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Or pass it directly when initializing the tool:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"vscode": {
"languageId": "plaintext"
}
},
"outputs": [],
"source": [
"from langchain_brightdata import BrightDataUnlocker\n",
"\n",
"unlocker_tool = BrightDataUnlocker(bright_data_api_key=\"your-api-key\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Instantiation\n",
"\n",
"Here we show how to instantiate an instance of the BrightDataUnlocker tool. This tool allows you to access websites that may be protected by anti-bot measures, geo-restrictions, or other access limitations using Bright Data's Web Unlocker service.\n",
"\n",
"The tool accepts various parameters during instantiation:\n",
"\n",
"- `bright_data_api_key` (required, str): Your Bright Data API key for authentication.\n",
"- `format` (optional, Literal[\"raw\"]): Format of the response content. Default is \"raw\".\n",
"- `country` (optional, str): Two-letter country code for geo-specific access (e.g., \"us\", \"gb\", \"de\", \"jp\"). Set this when you need to view the website as if accessing from a specific country. Default is None.\n",
"- `zone` (optional, str): Bright Data zone to use for the request. The \"unlocker\" zone is optimized for accessing websites that might block regular requests. Default is \"unlocker\".\n",
"- `data_format` (optional, Literal[\"html\", \"markdown\", \"screenshot\"]): Output format for the retrieved content. Options include:\n",
" - \"html\" - Returns the standard HTML content (default)\n",
" - \"markdown\" - Returns content converted to markdown format\n",
" - \"screenshot\" - Returns a PNG screenshot of the rendered page"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Invocation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Basic Usage"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"vscode": {
"languageId": "plaintext"
}
},
"outputs": [],
"source": [
"from langchain_brightdata import BrightDataUnlocker\n",
"\n",
"# Initialize the tool\n",
"unlocker_tool = BrightDataUnlocker(\n",
" bright_data_api_key=\"your-api-key\" # Optional if set in environment variables\n",
")\n",
"\n",
"# Access a webpage\n",
"result = unlocker_tool.invoke(\"https://example.com\")\n",
"\n",
"print(result)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Advanced Usage with Parameters"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"vscode": {
"languageId": "plaintext"
}
},
"outputs": [],
"source": [
"from langchain_brightdata import BrightDataUnlocker\n",
"\n",
"unlocker_tool = BrightDataUnlocker(\n",
" bright_data_api_key=\"your-api-key\",\n",
")\n",
"\n",
"# Access a webpage with specific parameters\n",
"result = unlocker_tool.invoke(\n",
" {\n",
" \"url\": \"https://example.com/region-restricted-content\",\n",
" \"country\": \"gb\", # Access as if from Great Britain\n",
" \"data_format\": \"html\", # Get content in markdown format\n",
" \"zone\": \"unlocker\", # Use the unlocker zone\n",
" }\n",
")\n",
"\n",
"print(result)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Customization Options"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The BrightDataUnlocker tool accepts several parameters for customization:\n",
"\n",
"|Parameter|Type|Description|\n",
"|:--|:--|:--|\n",
"|`url`|str|The URL to access|\n",
"|`format`|str|Format of the response content (default: \"raw\")|\n",
"|`country`|str|Two-letter country code for geo-specific access (e.g., \"us\", \"gb\")|\n",
"|`zone`|str|Bright Data zone to use (default: \"unlocker\")|\n",
"|`data_format`|str|Output format: None (HTML), \"markdown\", or \"screenshot\"|\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data Format Options\n",
"\n",
"The `data_format` parameter allows you to specify how the content should be returned:\n",
"\n",
"- `None` or `\"html\"` (default): Returns the standard HTML content of the page\n",
"- `\"markdown\"`: Returns the content converted to markdown format, which is useful for feeding directly to LLMs\n",
"- `\"screenshot\"`: Returns a PNG screenshot of the rendered page, useful for visual analysis"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Use within an agent"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"vscode": {
"languageId": "plaintext"
}
},
"outputs": [],
"source": [
"from langchain_brightdata import BrightDataUnlocker\n",
"from langchain_google_genai import ChatGoogleGenerativeAI\n",
"from langgraph.prebuilt import create_react_agent\n",
"\n",
"# Initialize the LLM\n",
"llm = ChatGoogleGenerativeAI(model=\"gemini-2.0-flash\", google_api_key=\"your-api-key\")\n",
"\n",
"# Initialize the tool\n",
"bright_data_tool = BrightDataUnlocker(bright_data_api_key=\"your-api-key\")\n",
"\n",
"# Create the agent\n",
"agent = create_react_agent(llm, [bright_data_tool])\n",
"\n",
"# Input URLs or prompt\n",
"user_input = \"Get the content from https://example.com/region-restricted-page - access it from GB\"\n",
"\n",
"# Stream the agent's output step by step\n",
"for step in agent.stream(\n",
" {\"messages\": user_input},\n",
" stream_mode=\"values\",\n",
"):\n",
" step[\"messages\"][-1].pretty_print()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"- [Bright Data API Documentation](https://docs.brightdata.com/scraping-automation/web-unlocker/introduction)"
]
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@ -659,3 +659,6 @@ packages:
- name: langchain-aerospike
path: .
repo: aerospike/langchain-aerospike
- name: langchain-brightdata
repo: luminati-io/langchain-brightdata
path: .