Merge branch 'master' into wip-v1.0

This commit is contained in:
Mason Daugherty
2025-09-16 22:04:42 -04:00
committed by GitHub
8 changed files with 418 additions and 26 deletions

View File

@@ -3,8 +3,4 @@
Hi there! Thank you for even being interested in contributing to LangChain.
As an open-source project in a rapidly developing field, we are extremely open to contributions, whether they involve new features, improved infrastructure, better documentation, or bug fixes.
To learn how to contribute to LangChain, please follow the [contribution guide here](https://python.langchain.com/docs/contributing/).
## New features
For new features, please start a new [discussion on our forum](https://forum.langchain.com/), where the maintainers will help with scoping out the necessary changes.
To learn how to contribute to LangChain, please follow the [contribution guide here](https://docs.langchain.com/oss/python/contributing).

View File

@@ -26,9 +26,10 @@
# • release — prepare a new release
#
# Allowed Scopes (optional):
# core, cli, langchain, standard-tests, docs, anthropic, chroma, deepseek,
# exa, fireworks, groq, huggingface, mistralai, nomic, ollama, openai,
# perplexity, prompty, qdrant, xai
# core, cli, langchain, langchain_v1, langchain_legacy, standard-tests,
# text-splitters, docs, anthropic, chroma, deepseek, exa, fireworks, groq,
# huggingface, mistralai, nomic, ollama, openai, perplexity, prompty, qdrant,
# xai, infra
#
# Rules & Tips for New Committers:
# 1. Subject (type) must start with a lowercase letter and, if possible, be
@@ -84,6 +85,7 @@ jobs:
cli
langchain
langchain_v1
langchain_legacy
standard-tests
text-splitters
docs

View File

@@ -22,9 +22,7 @@ Example scenarios with mitigation strategies:
* A user may ask an agent with write access to an external API to write malicious data to the API, or delete data from that API. To mitigate, give the agent read-only API keys, or limit it to only use endpoints that are already resistant to such misuse.
* A user may ask an agent with access to a database to drop a table or mutate the schema. To mitigate, scope the credentials to only the tables that the agent needs to access and consider issuing READ-ONLY credentials.
If you're building applications that access external resources like file systems, APIs
or databases, consider speaking with your company's security team to determine how to best
design and secure your applications.
If you're building applications that access external resources like file systems, APIs or databases, consider speaking with your company's security team to determine how to best design and secure your applications.
## Reporting OSS Vulnerabilities
@@ -38,9 +36,7 @@ Before reporting a vulnerability, please review:
1) In-Scope Targets and Out-of-Scope Targets below.
2) The [langchain-ai/langchain](https://python.langchain.com/docs/contributing/repo_structure) monorepo structure.
3) The [Best Practices](#best-practices) above to
understand what we consider to be a security vulnerability vs. developer
responsibility.
3) The [Best Practices](#best-practices) above to understand what we consider to be a security vulnerability vs. developer responsibility.
### In-Scope Targets

View File

@@ -0,0 +1,72 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# ScraperAPI\n",
"\n",
"[ScraperAPI](https://www.scraperapi.com/) enables data collection from any public website with its web scraping API, without worrying about proxies, browsers, or CAPTCHA handling. [langchain-scraperapi](https://github.com/scraperapi/langchain-scraperapi) wraps this service, making it easy for AI agents to browse the web and scrape data from it.\n",
"\n",
"## Installation and Setup\n",
"\n",
"- Install the Python package with `pip install langchain-scraperapi`.\n",
"- Obtain an API key from [ScraperAPI](https://www.scraperapi.com/) and set the environment variable `SCRAPERAPI_API_KEY`.\n",
"\n",
"### Tools\n",
"\n",
"The package offers 3 tools to scrape any website, get structured Google search results, and get structured Amazon search results respectively.\n",
"\n",
"To import them:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install langchain_scraperapi\n",
"\n",
"from langchain_scraperapi.tools import (\n",
" ScraperAPIAmazonSearchTool,\n",
" ScraperAPIGoogleSearchTool,\n",
" ScraperAPITool,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Example use:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tool = ScraperAPITool()\n",
"\n",
"result = tool.invoke({\"url\": \"https://example.com\", \"output_format\": \"markdown\"})\n",
"print(result)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For a more detailed walkthrough of how to use these tools, visit the [official repository](https://github.com/scraperapi/langchain-scraperapi)."
]
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,329 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "d3a12ba8",
"metadata": {},
"source": [
"# LangChain ScraperAPI\n",
"\n",
"Give your AI agent the ability to browse websites, search Google and Amazon in just two lines of code.\n",
"\n",
"The `langchain-scraperapi` package adds three ready-to-use LangChain tools backed by the [ScraperAPI](https://www.scraperapi.com/) service:\n",
"\n",
"| Tool class | Use it to |\n",
"|------------|------------------|\n",
"| `ScraperAPITool` | Grab the HTML/text/markdown of any web page |\n",
"| `ScraperAPIGoogleSearchTool` | Get structured Google Search SERP data |\n",
"| `ScraperAPIAmazonSearchTool` | Get structured Amazon product-search data |\n",
"\n",
"## Overview\n",
"\n",
"### Integration details\n",
"\n",
"| Package | Serializable | [JS support](https://js.langchain.com/docs/integrations/tools/__module_name__) | Package latest |\n",
"| :--- | :---: | :---: | :---: |\n",
"| [langchain-scraperapi](https://pypi.org/project/langchain-scraperapi/) | ❌ | ❌ | v0.1.1 |"
]
},
{
"cell_type": "markdown",
"id": "d1f7c70f",
"metadata": {},
"source": [
"### Setup\n",
"\n",
"Install the `langchain-scraperapi` package."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "494ecbc3",
"metadata": {},
"outputs": [],
"source": [
"%pip install -U langchain-scraperapi"
]
},
{
"cell_type": "markdown",
"id": "c111d2fb",
"metadata": {},
"source": [
"### Credentials\n",
"\n",
"Create an account at https://www.scraperapi.com/ and get an API key."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4d315465",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"os.environ[\"SCRAPERAPI_API_KEY\"] = \"your-api-key\""
]
},
{
"cell_type": "markdown",
"id": "e06ffe48",
"metadata": {},
"source": [
"## Instantiation"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "27ae5612",
"metadata": {},
"outputs": [],
"source": [
"from langchain_scraperapi.tools import ScraperAPITool\n",
"\n",
"tool = ScraperAPITool()"
]
},
{
"cell_type": "markdown",
"id": "9ff46136",
"metadata": {},
"source": [
"## Invocation"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6e1a4c7f",
"metadata": {},
"outputs": [],
"source": [
"output = tool.invoke(\n",
" {\n",
" \"url\": \"https://langchain.com\",\n",
" \"output_format\": \"markdown\",\n",
" \"render\": True,\n",
" }\n",
")\n",
"print(output)"
]
},
{
"cell_type": "markdown",
"id": "051ef7b1",
"metadata": {},
"source": [
"## Features\n",
"\n",
"### 1. `ScraperAPITool` — browse any website\n",
"\n",
"Invoke the *raw* ScraperAPI endpoint and get HTML, rendered DOM, text, or markdown.\n",
"\n",
"**Invocation arguments**\n",
"\n",
"* **`url`** **(required)** target page URL \n",
"* **Optional (mirror ScraperAPI query params)** \n",
" * `output_format`: `\"text\"` | `\"markdown\"` (default returns raw HTML) \n",
" * `country_code`: e.g. `\"us\"`, `\"de\"` \n",
" * `device_type`: `\"desktop\"` | `\"mobile\"` \n",
" * `premium`: `bool` use premium proxies \n",
" * `render`: `bool` run JS before returning HTML \n",
" * `keep_headers`: `bool` include response headers \n",
" \n",
"For the complete set of modifiers see the [ScraperAPI request-customisation docs](https://docs.scraperapi.com/python/making-requests/customizing-requests)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1a0c7cc2",
"metadata": {},
"outputs": [],
"source": [
"from langchain_scraperapi.tools import ScraperAPITool\n",
"\n",
"tool = ScraperAPITool()\n",
"\n",
"html_text = tool.invoke(\n",
" {\n",
" \"url\": \"https://langchain.com\",\n",
" \"output_format\": \"markdown\",\n",
" \"render\": True,\n",
" }\n",
")\n",
"print(html_text[:300], \"…\")"
]
},
{
"cell_type": "markdown",
"id": "9f2947dd",
"metadata": {},
"source": [
"### 2. `ScraperAPIGoogleSearchTool` — structured Google Search\n",
"\n",
"Structured SERP data via `/structured/google/search`.\n",
"\n",
"**Invocation arguments**\n",
"\n",
"* **`query`** **(required)** natural-language search string \n",
"* **Optional** — `country_code`, `tld`, `uule`, `hl`, `gl`, `ie`, `oe`, `start`, `num` \n",
"* `output_format`: `\"json\"` (default) or `\"csv\"`"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "aeac1195",
"metadata": {},
"outputs": [],
"source": [
"from langchain_scraperapi.tools import ScraperAPIGoogleSearchTool\n",
"\n",
"google_search = ScraperAPIGoogleSearchTool()\n",
"\n",
"results = google_search.invoke(\n",
" {\n",
" \"query\": \"what is langchain\",\n",
" \"num\": 20,\n",
" \"output_format\": \"json\",\n",
" }\n",
")\n",
"print(results)"
]
},
{
"cell_type": "markdown",
"id": "3dc2f845",
"metadata": {},
"source": [
"### 3. `ScraperAPIAmazonSearchTool` — structured Amazon Search\n",
"\n",
"Structured product results via `/structured/amazon/search`.\n",
"\n",
"**Invocation arguments**\n",
"\n",
"* **`query`** **(required)** product search terms \n",
"* **Optional** — `country_code`, `tld`, `page` \n",
"* `output_format`: `\"json\"` (default) or `\"csv\"`"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "05a4a6ed",
"metadata": {},
"outputs": [],
"source": [
"from langchain_scraperapi.tools import ScraperAPIAmazonSearchTool\n",
"\n",
"amazon_search = ScraperAPIAmazonSearchTool()\n",
"\n",
"products = amazon_search.invoke(\n",
" {\n",
" \"query\": \"noise cancelling headphones\",\n",
" \"tld\": \"co.uk\",\n",
" \"page\": 2,\n",
" }\n",
")\n",
"print(products)"
]
},
{
"cell_type": "markdown",
"id": "607eb8c8",
"metadata": {},
"source": [
"## Use within an agent\n",
"\n",
"Here is an example of using the tools in an AI agent. The `ScraperAPITool` gives the AI the ability to browse any website, summarize articles, and click on links to navigate between pages."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6541b286",
"metadata": {},
"outputs": [],
"source": [
"%pip install -U langchain-openai"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cb62e921",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"from langchain.agents import AgentExecutor, create_tool_calling_agent\n",
"from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\n",
"from langchain_openai import ChatOpenAI\n",
"from langchain_scraperapi.tools import ScraperAPITool\n",
"\n",
"os.environ[\"SCRAPERAPI_API_KEY\"] = \"your-api-key\"\n",
"os.environ[\"OPENAI_API_KEY\"] = \"your-api-key\"\n",
"\n",
"tools = [ScraperAPITool(output_format=\"markdown\")]\n",
"llm = ChatOpenAI(model_name=\"gpt-4o\", temperature=0)\n",
"\n",
"prompt = ChatPromptTemplate.from_messages(\n",
" [\n",
" (\n",
" \"system\",\n",
" \"You are a helpful assistant that can browse websites for users. When asked to browse a website or a link, do so with the ScraperAPITool, then provide information based on the website based on the user's needs.\",\n",
" ),\n",
" (\"human\", \"{input}\"),\n",
" MessagesPlaceholder(variable_name=\"agent_scratchpad\"),\n",
" ]\n",
")\n",
"\n",
"agent = create_tool_calling_agent(llm, tools, prompt)\n",
"agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)\n",
"response = agent_executor.invoke(\n",
" {\"input\": \"can you browse hacker news and summarize the first website\"}\n",
")"
]
},
{
"cell_type": "markdown",
"id": "4e90c894",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"Below you can find more information on additional parameters to the tools to customize your requests.\n",
"\n",
"* [ScraperAPITool](https://docs.scraperapi.com/python/making-requests/customizing-requests)\n",
"* [ScraperAPIGoogleSearchTool](https://docs.scraperapi.com/python/make-requests-with-scraperapi-in-python/scraperapi-structured-data-collection-in-python/google-serp-api-structured-data-in-python)\n",
"* [ScraperAPIAmazonSearchTool](https://docs.scraperapi.com/python/make-requests-with-scraperapi-in-python/scraperapi-structured-data-collection-in-python/amazon-search-api-structured-data-in-python)\n",
"\n",
"The LangChain wrappers surface these parameters directly."
]
}
],
"metadata": {
"jupytext": {
"cell_metadata_filter": "-all",
"main_language": "python",
"notebook_metadata_filter": "-all"
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.10.2"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -684,7 +684,7 @@
},
{
"cell_type": "code",
"execution_count": 21,
"execution_count": null,
"id": "a13462d0-2d02-4474-921e-15a1ba1fa274",
"metadata": {},
"outputs": [
@@ -702,16 +702,15 @@
}
],
"source": [
"input_message = {\"role\": \"user\", \"content\": \"Hi, I'm Bob!\"}\n",
"for step in agent_executor.stream(\n",
" {\"messages\": [input_message]}, config, stream_mode=\"values\"\n",
" {\"messages\": [(\"user\", \"Hi, I'm Bob!\")]}, config, stream_mode=\"values\"\n",
"):\n",
" step[\"messages\"][-1].pretty_print()"
]
},
{
"cell_type": "code",
"execution_count": 22,
"execution_count": null,
"id": "56d8028b-5dbc-40b2-86f5-ed60631d86a3",
"metadata": {},
"outputs": [
@@ -729,9 +728,8 @@
}
],
"source": [
"input_message = {\"role\": \"user\", \"content\": \"What's my name?\"}\n",
"for step in agent_executor.stream(\n",
" {\"messages\": [input_message]}, config, stream_mode=\"values\"\n",
" {\"messages\": [(\"user\", \"What is my name?\")]}, config, stream_mode=\"values\"\n",
"):\n",
" step[\"messages\"][-1].pretty_print()"
]
@@ -754,7 +752,7 @@
},
{
"cell_type": "code",
"execution_count": 23,
"execution_count": null,
"id": "24460239",
"metadata": {},
"outputs": [
@@ -775,9 +773,8 @@
"# highlight-next-line\n",
"config = {\"configurable\": {\"thread_id\": \"xyz123\"}}\n",
"\n",
"input_message = {\"role\": \"user\", \"content\": \"What's my name?\"}\n",
"for step in agent_executor.stream(\n",
" {\"messages\": [input_message]}, config, stream_mode=\"values\"\n",
" {\"messages\": [(\"user\", \"What is my name?\")]}, config, stream_mode=\"values\"\n",
"):\n",
" step[\"messages\"][-1].pretty_print()"
]

View File

@@ -39,9 +39,6 @@ test_integration = []
langchain-core = { path = "../core", editable = true }
langchain = { path = "../langchain", editable = true }
[tool.ruff]
target-version = "py39"
[tool.ruff.format]
docstring-code-format = true

View File

@@ -749,3 +749,6 @@ packages:
- name: langchain-zeusdb
repo: zeusdb/langchain-zeusdb
path: libs/zeusdb
- name: langchain-scraperapi
path: .
repo: scraperapi/langchain-scraperapi