Merge branch 'master' into cc/oai_responses

This commit is contained in:
Bagatur 2025-03-12 04:23:20 -07:00
commit 7b09f2fb1e
11 changed files with 1499 additions and 41 deletions

View File

@ -0,0 +1,265 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "wkUAAcGZNSJ3"
},
"source": [
"# AgentQLLoader\n",
"\n",
"[AgentQL](https://www.agentql.com/)'s document loader provides structured data extraction from any web page using an [AgentQL query](https://docs.agentql.com/agentql-query). AgentQL can be used across multiple languages and web pages without breaking over time and change.\n",
"\n",
"## Overview\n",
"\n",
"`AgentQLLoader` requires the following two parameters:\n",
"- `url`: The URL of the web page you want to extract data from.\n",
"- `query`: The AgentQL query to execute. Learn more about [how to write an AgentQL query in the docs](https://docs.agentql.com/agentql-query) or test one out in the [AgentQL Playground](https://dev.agentql.com/playground).\n",
"\n",
"Setting the following parameters are optional:\n",
"- `api_key`: Your AgentQL API key from [dev.agentql.com](https://dev.agentql.com). **`Optional`.**\n",
"- `timeout`: The number of seconds to wait for a request before timing out. **Defaults to `900`.**\n",
"- `is_stealth_mode_enabled`: Whether to enable experimental anti-bot evasion strategies. This feature may not work for all websites at all times. Data extraction may take longer to complete with this mode enabled. **Defaults to `False`.**\n",
"- `wait_for`: The number of seconds to wait for the page to load before extracting data. **Defaults to `0`.**\n",
"- `is_scroll_to_bottom_enabled`: Whether to scroll to bottom of the page before extracting data. **Defaults to `False`.**\n",
"- `mode`: `\"standard\"` uses deep data analysis, while `\"fast\"` trades some depth of analysis for speed and is adequate for most usecases. [Learn more about the modes in this guide.](https://docs.agentql.com/accuracy/standard-mode) **Defaults to `\"fast\"`.**\n",
"- `is_screenshot_enabled`: Whether to take a screenshot before extracting data. Returned in 'metadata' as a Base64 string. **Defaults to `False`.**\n",
"\n",
"AgentQLLoader is implemented with AgentQL's [REST API](https://docs.agentql.com/rest-api/api-reference)\n",
"\n",
"### Integration details\n",
"\n",
"| Class | Package | Local | Serializable | JS support |\n",
"| :--- | :--- | :---: | :---: | :---: |\n",
"| AgentQLLoader| langchain-agentql | ✅ | ❌ | ❌ |\n",
"\n",
"### Loader features\n",
"| Source | Document Lazy Loading | Native Async Support\n",
"| :---: | :---: | :---: |\n",
"| AgentQLLoader | ✅ | ❌ |"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "CaKa2QrnwPXq"
},
"source": [
"## Setup\n",
"\n",
"To use the AgentQL Document Loader, you will need to configure the `AGENTQL_API_KEY` environment variable, or use the `api_key` parameter. You can acquire an API key from our [Dev Portal](https://dev.agentql.com)."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "mZNJvUQBNSJ5"
},
"source": [
"### Installation\n",
"\n",
"Install **langchain-agentql**."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "IblRoJJDNSJ5"
},
"outputs": [],
"source": [
"%pip install -qU langchain_agentql"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "SNsUT60YvfCm"
},
"source": [
"### Set Credentials"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"id": "2D1EN7Egvk1c"
},
"outputs": [],
"source": [
"import os\n",
"\n",
"os.environ[\"AGENTQL_API_KEY\"] = \"YOUR_AGENTQL_API_KEY\""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "D4hnJV_6NSJ5"
},
"source": [
"## Initialization\n",
"\n",
"Next instantiate your model object:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"id": "oMJdxL_KNSJ5"
},
"outputs": [],
"source": [
"from langchain_agentql.document_loaders import AgentQLLoader\n",
"\n",
"loader = AgentQLLoader(\n",
" url=\"https://www.agentql.com/blog\",\n",
" query=\"\"\"\n",
" {\n",
" posts[] {\n",
" title\n",
" url\n",
" date\n",
" author\n",
" }\n",
" }\n",
" \"\"\",\n",
" is_scroll_to_bottom_enabled=True,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "SRxIOx90NSJ5"
},
"source": [
"## Load"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "bNnnCZ1oNSJ5",
"outputId": "d0eb8cb4-9742-4f0c-80f1-0509a3af1808"
},
"outputs": [
{
"data": {
"text/plain": [
"Document(metadata={'request_id': 'bdb9dbe7-8a7f-427f-bc16-839ccc02cae6', 'generated_query': None, 'screenshot': None}, page_content=\"{'posts': [{'title': 'Launch Week Recap—make the web AI-ready', 'url': 'https://www.agentql.com/blog/2024-launch-week-recap', 'date': 'Nov 18, 2024', 'author': 'Rachel-Lee Nabors'}, {'title': 'Accurate data extraction from PDFs and images with AgentQL', 'url': 'https://www.agentql.com/blog/accurate-data-extraction-pdfs-images', 'date': 'Feb 1, 2025', 'author': 'Rachel-Lee Nabors'}, {'title': 'Introducing Scheduled Scraping Workflows', 'url': 'https://www.agentql.com/blog/scheduling', 'date': 'Dec 2, 2024', 'author': 'Rachel-Lee Nabors'}, {'title': 'Updates to Our Pricing Model', 'url': 'https://www.agentql.com/blog/2024-pricing-update', 'date': 'Nov 19, 2024', 'author': 'Rachel-Lee Nabors'}, {'title': 'Get data from any page: AgentQLs REST API Endpoint—Launch week day 5', 'url': 'https://www.agentql.com/blog/data-rest-api', 'date': 'Nov 15, 2024', 'author': 'Rachel-Lee Nabors'}]}\")"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs = loader.load()\n",
"docs[0]"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "wtPMNh72NSJ5",
"outputId": "59d529a4-3c22-445c-f5cf-dc7b24168906"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'request_id': 'bdb9dbe7-8a7f-427f-bc16-839ccc02cae6', 'generated_query': None, 'screenshot': None}\n"
]
}
],
"source": [
"print(docs[0].metadata)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "7RMuEwl4NSJ5"
},
"source": [
"## Lazy Load\n",
"\n",
"`AgentQLLoader` currently only loads one `Document` at a time. Therefore, `load()` and `lazy_load()` behave the same:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "FIYddZBONSJ5",
"outputId": "c39a7a6d-bc52-4ef9-b36f-e1d138590b79"
},
"outputs": [
{
"data": {
"text/plain": [
"[Document(metadata={'request_id': '06273abd-b2ef-4e15-b0ec-901cba7b4825', 'generated_query': None, 'screenshot': None}, page_content=\"{'posts': [{'title': 'Launch Week Recap—make the web AI-ready', 'url': 'https://www.agentql.com/blog/2024-launch-week-recap', 'date': 'Nov 18, 2024', 'author': 'Rachel-Lee Nabors'}, {'title': 'Accurate data extraction from PDFs and images with AgentQL', 'url': 'https://www.agentql.com/blog/accurate-data-extraction-pdfs-images', 'date': 'Feb 1, 2025', 'author': 'Rachel-Lee Nabors'}, {'title': 'Introducing Scheduled Scraping Workflows', 'url': 'https://www.agentql.com/blog/scheduling', 'date': 'Dec 2, 2024', 'author': 'Rachel-Lee Nabors'}, {'title': 'Updates to Our Pricing Model', 'url': 'https://www.agentql.com/blog/2024-pricing-update', 'date': 'Nov 19, 2024', 'author': 'Rachel-Lee Nabors'}, {'title': 'Get data from any page: AgentQLs REST API Endpoint—Launch week day 5', 'url': 'https://www.agentql.com/blog/data-rest-api', 'date': 'Nov 15, 2024', 'author': 'Rachel-Lee Nabors'}]}\")]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pages = [doc for doc in loader.lazy_load()]\n",
"pages"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"For more information on how to use this integration, please refer to the [git repo](https://github.com/tinyfish-io/agentql-integrations/tree/main/langchain) or the [langchain integration documentation](https://docs.agentql.com/integrations/langchain)"
]
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
}
},
"nbformat": 4,
"nbformat_minor": 0
}

View File

@ -0,0 +1,35 @@
# AgentQL
[AgentQL](https://www.agentql.com/) provides web interaction and structured data extraction from any web page using an [AgentQL query](https://docs.agentql.com/agentql-query) or a Natural Language prompt. AgentQL can be used across multiple languages and web pages without breaking over time and change.
## Installation and Setup
Install the integration package:
```bash
pip install langchain-agentql
```
## API Key
Get an API Key from our [Dev Portal](https://dev.agentql.com/) and add it to your environment variables:
```
export AGENTQL_API_KEY="your-api-key-here"
```
## DocumentLoader
AgentQL's document loader provides structured data extraction from any web page using an AgentQL query.
```python
from langchain_agentql.document_loaders import AgentQLLoader
```
See our [document loader documentation and usage example](/docs/integrations/document_loaders/agentql).
## Tools and Toolkits
AgentQL tools provides web interaction and structured data extraction from any web page using an AgentQL query or a Natural Language prompt.
```python
from langchain_agentql.tools import ExtractWebDataTool, ExtractWebDataBrowserTool, GetWebElementBrowserTool
from langchain_agentql import AgentQLBrowserToolkit
```
See our [tools documentation and usage example](/docs/integrations/tools/agentql).

File diff suppressed because it is too large Load Diff

View File

@ -147,6 +147,11 @@ WEBBROWSING_TOOL_FEAT_TABLE = {
"interactions": True,
"pricing": "40 free requests/day",
},
"AgentQL Toolkit": {
"link": "/docs/integrations/tools/agentql",
"interactions": True,
"pricing": "Free trial, with pay-as-you-go and flat rate plans after",
},
}
DATABASE_TOOL_FEAT_TABLE = {

View File

@ -819,6 +819,13 @@ const FEATURE_TABLES = {
source: "Platform for running and scaling headless browsers, can be used to scrape/crawl any site",
api: "API",
apiLink: "https://python.langchain.com/docs/integrations/document_loaders/hyperbrowser/"
},
{
name: "AgentQL",
link: "agentql",
source: "Web interaction and structured data extraction from any web page using an AgentQL query or a Natural Language prompt",
api: "API",
apiLink: "https://python.langchain.com/docs/integrations/document_loaders/agentql/"
}
]
},

View File

@ -56,37 +56,50 @@ def draw_mermaid(
if with_styles
else "graph TD;\n"
)
# Group nodes by subgraph
subgraph_nodes: dict[str, dict[str, Node]] = {}
regular_nodes: dict[str, Node] = {}
if with_styles:
# Node formatting templates
default_class_label = "default"
format_dict = {default_class_label: "{0}({1})"}
if first_node is not None:
format_dict[first_node] = "{0}([{1}]):::first"
if last_node is not None:
format_dict[last_node] = "{0}([{1}]):::last"
for key, node in nodes.items():
if ":" in key:
# For nodes with colons, add them only to their deepest subgraph level
prefix = ":".join(key.split(":")[:-1])
subgraph_nodes.setdefault(prefix, {})[key] = node
else:
regular_nodes[key] = node
# Add nodes to the graph
for key, node in nodes.items():
node_name = node.name.split(":")[-1]
# Node formatting templates
default_class_label = "default"
format_dict = {default_class_label: "{0}({1})"}
if first_node is not None:
format_dict[first_node] = "{0}([{1}]):::first"
if last_node is not None:
format_dict[last_node] = "{0}([{1}]):::last"
def render_node(key: str, node: Node, indent: str = "\t") -> str:
"""Helper function to render a node with consistent formatting."""
node_name = node.name.split(":")[-1]
label = (
f"<p>{node_name}</p>"
if node_name.startswith(tuple(MARKDOWN_SPECIAL_CHARS))
and node_name.endswith(tuple(MARKDOWN_SPECIAL_CHARS))
else node_name
)
if node.metadata:
label = (
f"<p>{node_name}</p>"
if node_name.startswith(tuple(MARKDOWN_SPECIAL_CHARS))
and node_name.endswith(tuple(MARKDOWN_SPECIAL_CHARS))
else node_name
f"{label}<hr/><small><em>"
+ "\n".join(f"{k} = {value}" for k, value in node.metadata.items())
+ "</em></small>"
)
if node.metadata:
label = (
f"{label}<hr/><small><em>"
+ "\n".join(
f"{key} = {value}" for key, value in node.metadata.items()
)
+ "</em></small>"
)
node_label = format_dict.get(key, format_dict[default_class_label]).format(
_escape_node_label(key), label
)
mermaid_graph += f"\t{node_label}\n"
node_label = format_dict.get(key, format_dict[default_class_label]).format(
_escape_node_label(key), label
)
return f"{indent}{node_label}\n"
# Add non-subgraph nodes to the graph
if with_styles:
for key, node in regular_nodes.items():
mermaid_graph += render_node(key, node)
# Group edges by their common prefixes
edge_groups: dict[str, list[Edge]] = {}
@ -116,6 +129,11 @@ def draw_mermaid(
seen_subgraphs.add(subgraph)
mermaid_graph += f"\tsubgraph {subgraph}\n"
# Add nodes that belong to this subgraph
if with_styles and prefix in subgraph_nodes:
for key, node in subgraph_nodes[prefix].items():
mermaid_graph += render_node(key, node)
for edge in edges:
source, target = edge.source, edge.target
@ -156,11 +174,25 @@ def draw_mermaid(
# Start with the top-level edges (no common prefix)
add_subgraph(edge_groups.get("", []), "")
# Add remaining subgraphs
# Add remaining subgraphs with edges
for prefix in edge_groups:
if ":" in prefix or prefix == "":
continue
add_subgraph(edge_groups[prefix], prefix)
seen_subgraphs.add(prefix)
# Add empty subgraphs (subgraphs with no internal edges)
if with_styles:
for prefix in subgraph_nodes:
if ":" not in prefix and prefix not in seen_subgraphs:
mermaid_graph += f"\tsubgraph {prefix}\n"
# Add nodes that belong to this subgraph
for key, node in subgraph_nodes[prefix].items():
mermaid_graph += render_node(key, node)
mermaid_graph += "\tend\n"
seen_subgraphs.add(prefix)
# Add custom styles for nodes
if with_styles:

View File

@ -17,7 +17,7 @@ dependencies = [
"pydantic<3.0.0,>=2.7.4; python_full_version >= \"3.12.4\"",
]
name = "langchain-core"
version = "0.3.43"
version = "0.3.44"
description = "Building applications with LLMs through composability"
readme = "README.md"

View File

@ -5,9 +5,6 @@
graph TD;
__start__([<p>__start__</p>]):::first
parent_1(parent_1)
child_child_1_grandchild_1(grandchild_1)
child_child_1_grandchild_2(grandchild_2<hr/><small><em>__interrupt = before</em></small>)
child_child_2(child_2)
parent_2(parent_2)
__end__([<p>__end__</p>]):::last
__start__ --> parent_1;
@ -15,8 +12,11 @@
parent_1 --> child_child_1_grandchild_1;
parent_2 --> __end__;
subgraph child
child_child_2(child_2)
child_child_1_grandchild_2 --> child_child_2;
subgraph child_1
child_child_1_grandchild_1(grandchild_1)
child_child_1_grandchild_2(grandchild_2<hr/><small><em>__interrupt = before</em></small>)
child_child_1_grandchild_1 --> child_child_1_grandchild_2;
end
end
@ -32,10 +32,6 @@
graph TD;
__start__([<p>__start__</p>]):::first
parent_1(parent_1)
child_child_1_grandchild_1(grandchild_1)
child_child_1_grandchild_1_greatgrandchild(greatgrandchild)
child_child_1_grandchild_2(grandchild_2<hr/><small><em>__interrupt = before</em></small>)
child_child_2(child_2)
parent_2(parent_2)
__end__([<p>__end__</p>]):::last
__start__ --> parent_1;
@ -43,10 +39,14 @@
parent_1 --> child_child_1_grandchild_1;
parent_2 --> __end__;
subgraph child
child_child_2(child_2)
child_child_1_grandchild_2 --> child_child_2;
subgraph child_1
child_child_1_grandchild_1(grandchild_1)
child_child_1_grandchild_2(grandchild_2<hr/><small><em>__interrupt = before</em></small>)
child_child_1_grandchild_1_greatgrandchild --> child_child_1_grandchild_2;
subgraph grandchild_1
child_child_1_grandchild_1_greatgrandchild(greatgrandchild)
child_child_1_grandchild_1 --> child_child_1_grandchild_1_greatgrandchild;
end
end
@ -1996,10 +1996,6 @@
graph TD;
__start__([<p>__start__</p>]):::first
outer_1(outer_1)
inner_1_inner_1(inner_1)
inner_1_inner_2(inner_2<hr/><small><em>__interrupt = before</em></small>)
inner_2_inner_1(inner_1)
inner_2_inner_2(inner_2)
outer_2(outer_2)
__end__([<p>__end__</p>]):::last
__start__ --> outer_1;
@ -2009,9 +2005,13 @@
outer_1 --> inner_2_inner_1;
outer_2 --> __end__;
subgraph inner_1
inner_1_inner_1(inner_1)
inner_1_inner_2(inner_2<hr/><small><em>__interrupt = before</em></small>)
inner_1_inner_1 --> inner_1_inner_2;
end
subgraph inner_2
inner_2_inner_1(inner_1)
inner_2_inner_2(inner_2)
inner_2_inner_1 --> inner_2_inner_2;
end
classDef default fill:#f2f0ff,line-height:1.2
@ -2020,6 +2020,23 @@
'''
# ---
# name: test_single_node_subgraph_mermaid[mermaid]
'''
%%{init: {'flowchart': {'curve': 'linear'}}}%%
graph TD;
__start__([<p>__start__</p>]):::first
__end__([<p>__end__</p>]):::last
__start__ --> sub_meow;
sub_meow --> __end__;
subgraph sub
sub_meow(meow)
end
classDef default fill:#f2f0ff,line-height:1.2
classDef first fill-opacity:0
classDef last fill:#bfb6fc
'''
# ---
# name: test_trim
dict({
'edges': list([

View File

@ -448,6 +448,23 @@ def test_triple_nested_subgraph_mermaid(snapshot: SnapshotAssertion) -> None:
assert graph.draw_mermaid() == snapshot(name="mermaid")
def test_single_node_subgraph_mermaid(snapshot: SnapshotAssertion) -> None:
empty_data = BaseModel
nodes = {
"__start__": Node(
id="__start__", name="__start__", data=empty_data, metadata=None
),
"sub:meow": Node(id="sub:meow", name="meow", data=empty_data, metadata=None),
"__end__": Node(id="__end__", name="__end__", data=empty_data, metadata=None),
}
edges = [
Edge(source="__start__", target="sub:meow", data=None, conditional=False),
Edge(source="sub:meow", target="__end__", data=None, conditional=False),
]
graph = Graph(nodes, edges)
assert graph.draw_mermaid() == snapshot(name="mermaid")
def test_runnable_get_graph_with_invalid_input_type() -> None:
"""Test that error isn't raised when getting graph with invalid input type."""

View File

@ -935,7 +935,7 @@ wheels = [
[[package]]
name = "langchain-core"
version = "0.3.43"
version = "0.3.44"
source = { editable = "." }
dependencies = [
{ name = "jsonpatch" },

View File

@ -513,3 +513,6 @@ packages:
- name: langchain-opengradient
path: .
repo: OpenGradient/og-langchain
- name: langchain-agentql
path: langchain
repo: tinyfish-io/agentql-integrations