docs: Add AgentQL provider doc, tool/toolkit doc and documentloader doc (#30144)

- **Description:** Added AgentQL docs for the provider page, tools page
and documentloader page
- **Twitter handle:** @AgentQL

Repo:
https://github.com/tinyfish-io/agentql-integrations/tree/main/langchain
PyPI: https://pypi.org/project/langchain-agentql/

If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
This commit is contained in:
Jason Zhang 2025-03-11 18:57:40 -07:00 committed by GitHub
parent 23fa70f328
commit 49bdd3b6fe
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
6 changed files with 1392 additions and 0 deletions

View File

@ -0,0 +1,265 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "wkUAAcGZNSJ3"
},
"source": [
"# AgentQLLoader\n",
"\n",
"[AgentQL](https://www.agentql.com/)'s document loader provides structured data extraction from any web page using an [AgentQL query](https://docs.agentql.com/agentql-query). AgentQL can be used across multiple languages and web pages without breaking over time and change.\n",
"\n",
"## Overview\n",
"\n",
"`AgentQLLoader` requires the following two parameters:\n",
"- `url`: The URL of the web page you want to extract data from.\n",
"- `query`: The AgentQL query to execute. Learn more about [how to write an AgentQL query in the docs](https://docs.agentql.com/agentql-query) or test one out in the [AgentQL Playground](https://dev.agentql.com/playground).\n",
"\n",
"Setting the following parameters are optional:\n",
"- `api_key`: Your AgentQL API key from [dev.agentql.com](https://dev.agentql.com). **`Optional`.**\n",
"- `timeout`: The number of seconds to wait for a request before timing out. **Defaults to `900`.**\n",
"- `is_stealth_mode_enabled`: Whether to enable experimental anti-bot evasion strategies. This feature may not work for all websites at all times. Data extraction may take longer to complete with this mode enabled. **Defaults to `False`.**\n",
"- `wait_for`: The number of seconds to wait for the page to load before extracting data. **Defaults to `0`.**\n",
"- `is_scroll_to_bottom_enabled`: Whether to scroll to bottom of the page before extracting data. **Defaults to `False`.**\n",
"- `mode`: `\"standard\"` uses deep data analysis, while `\"fast\"` trades some depth of analysis for speed and is adequate for most usecases. [Learn more about the modes in this guide.](https://docs.agentql.com/accuracy/standard-mode) **Defaults to `\"fast\"`.**\n",
"- `is_screenshot_enabled`: Whether to take a screenshot before extracting data. Returned in 'metadata' as a Base64 string. **Defaults to `False`.**\n",
"\n",
"AgentQLLoader is implemented with AgentQL's [REST API](https://docs.agentql.com/rest-api/api-reference)\n",
"\n",
"### Integration details\n",
"\n",
"| Class | Package | Local | Serializable | JS support |\n",
"| :--- | :--- | :---: | :---: | :---: |\n",
"| AgentQLLoader| langchain-agentql | ✅ | ❌ | ❌ |\n",
"\n",
"### Loader features\n",
"| Source | Document Lazy Loading | Native Async Support\n",
"| :---: | :---: | :---: |\n",
"| AgentQLLoader | ✅ | ❌ |"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "CaKa2QrnwPXq"
},
"source": [
"## Setup\n",
"\n",
"To use the AgentQL Document Loader, you will need to configure the `AGENTQL_API_KEY` environment variable, or use the `api_key` parameter. You can acquire an API key from our [Dev Portal](https://dev.agentql.com)."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "mZNJvUQBNSJ5"
},
"source": [
"### Installation\n",
"\n",
"Install **langchain-agentql**."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "IblRoJJDNSJ5"
},
"outputs": [],
"source": [
"%pip install -qU langchain_agentql"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "SNsUT60YvfCm"
},
"source": [
"### Set Credentials"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"id": "2D1EN7Egvk1c"
},
"outputs": [],
"source": [
"import os\n",
"\n",
"os.environ[\"AGENTQL_API_KEY\"] = \"YOUR_AGENTQL_API_KEY\""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "D4hnJV_6NSJ5"
},
"source": [
"## Initialization\n",
"\n",
"Next instantiate your model object:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"id": "oMJdxL_KNSJ5"
},
"outputs": [],
"source": [
"from langchain_agentql.document_loaders import AgentQLLoader\n",
"\n",
"loader = AgentQLLoader(\n",
" url=\"https://www.agentql.com/blog\",\n",
" query=\"\"\"\n",
" {\n",
" posts[] {\n",
" title\n",
" url\n",
" date\n",
" author\n",
" }\n",
" }\n",
" \"\"\",\n",
" is_scroll_to_bottom_enabled=True,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "SRxIOx90NSJ5"
},
"source": [
"## Load"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "bNnnCZ1oNSJ5",
"outputId": "d0eb8cb4-9742-4f0c-80f1-0509a3af1808"
},
"outputs": [
{
"data": {
"text/plain": [
"Document(metadata={'request_id': 'bdb9dbe7-8a7f-427f-bc16-839ccc02cae6', 'generated_query': None, 'screenshot': None}, page_content=\"{'posts': [{'title': 'Launch Week Recap—make the web AI-ready', 'url': 'https://www.agentql.com/blog/2024-launch-week-recap', 'date': 'Nov 18, 2024', 'author': 'Rachel-Lee Nabors'}, {'title': 'Accurate data extraction from PDFs and images with AgentQL', 'url': 'https://www.agentql.com/blog/accurate-data-extraction-pdfs-images', 'date': 'Feb 1, 2025', 'author': 'Rachel-Lee Nabors'}, {'title': 'Introducing Scheduled Scraping Workflows', 'url': 'https://www.agentql.com/blog/scheduling', 'date': 'Dec 2, 2024', 'author': 'Rachel-Lee Nabors'}, {'title': 'Updates to Our Pricing Model', 'url': 'https://www.agentql.com/blog/2024-pricing-update', 'date': 'Nov 19, 2024', 'author': 'Rachel-Lee Nabors'}, {'title': 'Get data from any page: AgentQLs REST API Endpoint—Launch week day 5', 'url': 'https://www.agentql.com/blog/data-rest-api', 'date': 'Nov 15, 2024', 'author': 'Rachel-Lee Nabors'}]}\")"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs = loader.load()\n",
"docs[0]"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "wtPMNh72NSJ5",
"outputId": "59d529a4-3c22-445c-f5cf-dc7b24168906"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'request_id': 'bdb9dbe7-8a7f-427f-bc16-839ccc02cae6', 'generated_query': None, 'screenshot': None}\n"
]
}
],
"source": [
"print(docs[0].metadata)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "7RMuEwl4NSJ5"
},
"source": [
"## Lazy Load\n",
"\n",
"`AgentQLLoader` currently only loads one `Document` at a time. Therefore, `load()` and `lazy_load()` behave the same:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "FIYddZBONSJ5",
"outputId": "c39a7a6d-bc52-4ef9-b36f-e1d138590b79"
},
"outputs": [
{
"data": {
"text/plain": [
"[Document(metadata={'request_id': '06273abd-b2ef-4e15-b0ec-901cba7b4825', 'generated_query': None, 'screenshot': None}, page_content=\"{'posts': [{'title': 'Launch Week Recap—make the web AI-ready', 'url': 'https://www.agentql.com/blog/2024-launch-week-recap', 'date': 'Nov 18, 2024', 'author': 'Rachel-Lee Nabors'}, {'title': 'Accurate data extraction from PDFs and images with AgentQL', 'url': 'https://www.agentql.com/blog/accurate-data-extraction-pdfs-images', 'date': 'Feb 1, 2025', 'author': 'Rachel-Lee Nabors'}, {'title': 'Introducing Scheduled Scraping Workflows', 'url': 'https://www.agentql.com/blog/scheduling', 'date': 'Dec 2, 2024', 'author': 'Rachel-Lee Nabors'}, {'title': 'Updates to Our Pricing Model', 'url': 'https://www.agentql.com/blog/2024-pricing-update', 'date': 'Nov 19, 2024', 'author': 'Rachel-Lee Nabors'}, {'title': 'Get data from any page: AgentQLs REST API Endpoint—Launch week day 5', 'url': 'https://www.agentql.com/blog/data-rest-api', 'date': 'Nov 15, 2024', 'author': 'Rachel-Lee Nabors'}]}\")]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pages = [doc for doc in loader.lazy_load()]\n",
"pages"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"For more information on how to use this integration, please refer to the [git repo](https://github.com/tinyfish-io/agentql-integrations/tree/main/langchain) or the [langchain integration documentation](https://docs.agentql.com/integrations/langchain)"
]
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
}
},
"nbformat": 4,
"nbformat_minor": 0
}

View File

@ -0,0 +1,35 @@
# AgentQL
[AgentQL](https://www.agentql.com/) provides web interaction and structured data extraction from any web page using an [AgentQL query](https://docs.agentql.com/agentql-query) or a Natural Language prompt. AgentQL can be used across multiple languages and web pages without breaking over time and change.
## Installation and Setup
Install the integration package:
```bash
pip install langchain-agentql
```
## API Key
Get an API Key from our [Dev Portal](https://dev.agentql.com/) and add it to your environment variables:
```
export AGENTQL_API_KEY="your-api-key-here"
```
## DocumentLoader
AgentQL's document loader provides structured data extraction from any web page using an AgentQL query.
```python
from langchain_agentql.document_loaders import AgentQLLoader
```
See our [document loader documentation and usage example](/docs/integrations/document_loaders/agentql).
## Tools and Toolkits
AgentQL tools provides web interaction and structured data extraction from any web page using an AgentQL query or a Natural Language prompt.
```python
from langchain_agentql.tools import ExtractWebDataTool, ExtractWebDataBrowserTool, GetWebElementBrowserTool
from langchain_agentql import AgentQLBrowserToolkit
```
See our [tools documentation and usage example](/docs/integrations/tools/agentql).

File diff suppressed because it is too large Load Diff

View File

@ -147,6 +147,11 @@ WEBBROWSING_TOOL_FEAT_TABLE = {
"interactions": True,
"pricing": "40 free requests/day",
},
"AgentQL Toolkit": {
"link": "/docs/integrations/tools/agentql",
"interactions": True,
"pricing": "Free trial, with pay-as-you-go and flat rate plans after",
},
}
DATABASE_TOOL_FEAT_TABLE = {

View File

@ -819,6 +819,13 @@ const FEATURE_TABLES = {
source: "Platform for running and scaling headless browsers, can be used to scrape/crawl any site",
api: "API",
apiLink: "https://python.langchain.com/docs/integrations/document_loaders/hyperbrowser/"
},
{
name: "AgentQL",
link: "agentql",
source: "Web interaction and structured data extraction from any web page using an AgentQL query or a Natural Language prompt",
api: "API",
apiLink: "https://python.langchain.com/docs/integrations/document_loaders/agentql/"
}
]
},

View File

@ -513,3 +513,6 @@ packages:
- name: langchain-opengradient
path: .
repo: OpenGradient/og-langchain
- name: langchain-agentql
path: langchain
repo: tinyfish-io/agentql-integrations