docs: Add AgentQL provider doc, tool/toolkit doc and documentloader doc (#30144)

- **Description:** Added AgentQL docs for the provider page, tools page and documentloader page - **Twitter handle:** @AgentQL Repo: https://github.com/tinyfish-io/agentql-integrations/tree/main/langchain PyPI: https://pypi.org/project/langchain-agentql/ If no one reviews your PR within a few days, please @-mention one of baskaryan, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-07-16 17:53:37 +00:00 · 2025-03-11 18:57:40 -07:00 · 2025-03-11 18:57:40 -07:00 · 49bdd3b6fe
commit 49bdd3b6fe
parent 23fa70f328
6 changed files with 1392 additions and 0 deletions
--- a/docs/docs/integrations/document_loaders/agentql.ipynb
+++ b/docs/docs/integrations/document_loaders/agentql.ipynb
@ -0,0 +1,265 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "wkUAAcGZNSJ3"
+      },
+      "source": [
+        "# AgentQLLoader\n",
+        "\n",
+        "[AgentQL](https://www.agentql.com/)'s document loader provides structured data extraction from any web page using an [AgentQL query](https://docs.agentql.com/agentql-query). AgentQL can be used across multiple languages and web pages without breaking over time and change.\n",
+        "\n",
+        "## Overview\n",
+        "\n",
+        "`AgentQLLoader` requires the following two parameters:\n",
+        "- `url`: The URL of the web page you want to extract data from.\n",
+        "- `query`: The AgentQL query to execute. Learn more about [how to write an AgentQL query in the docs](https://docs.agentql.com/agentql-query) or test one out in the [AgentQL Playground](https://dev.agentql.com/playground).\n",
+        "\n",
+        "Setting the following parameters are optional:\n",
+        "- `api_key`: Your AgentQL API key from [dev.agentql.com](https://dev.agentql.com). **`Optional`.**\n",
+        "- `timeout`: The number of seconds to wait for a request before timing out. **Defaults to `900`.**\n",
+        "- `is_stealth_mode_enabled`: Whether to enable experimental anti-bot evasion strategies. This feature may not work for all websites at all times. Data extraction may take longer to complete with this mode enabled. **Defaults to `False`.**\n",
+        "- `wait_for`: The number of seconds to wait for the page to load before extracting data. **Defaults to `0`.**\n",
+        "- `is_scroll_to_bottom_enabled`: Whether to scroll to bottom of the page before extracting data. **Defaults to `False`.**\n",
+        "- `mode`: `\"standard\"` uses deep data analysis, while `\"fast\"` trades some depth of analysis for speed and is adequate for most usecases. [Learn more about the modes in this guide.](https://docs.agentql.com/accuracy/standard-mode) **Defaults to `\"fast\"`.**\n",
+        "- `is_screenshot_enabled`: Whether to take a screenshot before extracting data. Returned in 'metadata' as a Base64 string. **Defaults to `False`.**\n",
+        "\n",
+        "AgentQLLoader is implemented with AgentQL's [REST API](https://docs.agentql.com/rest-api/api-reference)\n",
+        "\n",
+        "### Integration details\n",
+        "\n",
+        "| Class | Package | Local | Serializable | JS support |\n",
+        "| :--- | :--- | :---: | :---: |  :---: |\n",
+        "| AgentQLLoader| langchain-agentql | ✅ | ❌ | ❌ |\n",
+        "\n",
+        "### Loader features\n",
+        "| Source | Document Lazy Loading | Native Async Support\n",
+        "| :---: | :---: | :---: |\n",
+        "| AgentQLLoader | ✅ | ❌ |"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "CaKa2QrnwPXq"
+      },
+      "source": [
+        "## Setup\n",
+        "\n",
+        "To use the AgentQL Document Loader, you will need to configure the `AGENTQL_API_KEY` environment variable, or use the `api_key` parameter. You can acquire an API key from our [Dev Portal](https://dev.agentql.com)."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "mZNJvUQBNSJ5"
+      },
+      "source": [
+        "### Installation\n",
+        "\n",
+        "Install **langchain-agentql**."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "IblRoJJDNSJ5"
+      },
+      "outputs": [],
+      "source": [
+        "%pip install -qU langchain_agentql"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "SNsUT60YvfCm"
+      },
+      "source": [
+        "### Set Credentials"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 3,
+      "metadata": {
+        "id": "2D1EN7Egvk1c"
+      },
+      "outputs": [],
+      "source": [
+        "import os\n",
+        "\n",
+        "os.environ[\"AGENTQL_API_KEY\"] = \"YOUR_AGENTQL_API_KEY\""
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "D4hnJV_6NSJ5"
+      },
+      "source": [
+        "## Initialization\n",
+        "\n",
+        "Next instantiate your model object:"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 4,
+      "metadata": {
+        "id": "oMJdxL_KNSJ5"
+      },
+      "outputs": [],
+      "source": [
+        "from langchain_agentql.document_loaders import AgentQLLoader\n",
+        "\n",
+        "loader = AgentQLLoader(\n",
+        "    url=\"https://www.agentql.com/blog\",\n",
+        "    query=\"\"\"\n",
+        "    {\n",
+        "        posts[] {\n",
+        "            title\n",
+        "            url\n",
+        "            date\n",
+        "            author\n",
+        "        }\n",
+        "    }\n",
+        "    \"\"\",\n",
+        "    is_scroll_to_bottom_enabled=True,\n",
+        ")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "SRxIOx90NSJ5"
+      },
+      "source": [
+        "## Load"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 5,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "bNnnCZ1oNSJ5",
+        "outputId": "d0eb8cb4-9742-4f0c-80f1-0509a3af1808"
+      },
+      "outputs": [
+        {
+          "data": {
+            "text/plain": [
+              "Document(metadata={'request_id': 'bdb9dbe7-8a7f-427f-bc16-839ccc02cae6', 'generated_query': None, 'screenshot': None}, page_content=\"{'posts': [{'title': 'Launch Week Recap—make the web AI-ready', 'url': 'https://www.agentql.com/blog/2024-launch-week-recap', 'date': 'Nov 18, 2024', 'author': 'Rachel-Lee Nabors'}, {'title': 'Accurate data extraction from PDFs and images with AgentQL', 'url': 'https://www.agentql.com/blog/accurate-data-extraction-pdfs-images', 'date': 'Feb 1, 2025', 'author': 'Rachel-Lee Nabors'}, {'title': 'Introducing Scheduled Scraping Workflows', 'url': 'https://www.agentql.com/blog/scheduling', 'date': 'Dec 2, 2024', 'author': 'Rachel-Lee Nabors'}, {'title': 'Updates to Our Pricing Model', 'url': 'https://www.agentql.com/blog/2024-pricing-update', 'date': 'Nov 19, 2024', 'author': 'Rachel-Lee Nabors'}, {'title': 'Get data from any page: AgentQL’s REST API Endpoint—Launch week day 5', 'url': 'https://www.agentql.com/blog/data-rest-api', 'date': 'Nov 15, 2024', 'author': 'Rachel-Lee Nabors'}]}\")"
+            ]
+          },
+          "execution_count": 5,
+          "metadata": {},
+          "output_type": "execute_result"
+        }
+      ],
+      "source": [
+        "docs = loader.load()\n",
+        "docs[0]"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 6,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "wtPMNh72NSJ5",
+        "outputId": "59d529a4-3c22-445c-f5cf-dc7b24168906"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "{'request_id': 'bdb9dbe7-8a7f-427f-bc16-839ccc02cae6', 'generated_query': None, 'screenshot': None}\n"
+          ]
+        }
+      ],
+      "source": [
+        "print(docs[0].metadata)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "7RMuEwl4NSJ5"
+      },
+      "source": [
+        "## Lazy Load\n",
+        "\n",
+        "`AgentQLLoader` currently only loads one `Document` at a time. Therefore, `load()` and `lazy_load()` behave the same:"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 7,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "FIYddZBONSJ5",
+        "outputId": "c39a7a6d-bc52-4ef9-b36f-e1d138590b79"
+      },
+      "outputs": [
+        {
+          "data": {
+            "text/plain": [
+              "[Document(metadata={'request_id': '06273abd-b2ef-4e15-b0ec-901cba7b4825', 'generated_query': None, 'screenshot': None}, page_content=\"{'posts': [{'title': 'Launch Week Recap—make the web AI-ready', 'url': 'https://www.agentql.com/blog/2024-launch-week-recap', 'date': 'Nov 18, 2024', 'author': 'Rachel-Lee Nabors'}, {'title': 'Accurate data extraction from PDFs and images with AgentQL', 'url': 'https://www.agentql.com/blog/accurate-data-extraction-pdfs-images', 'date': 'Feb 1, 2025', 'author': 'Rachel-Lee Nabors'}, {'title': 'Introducing Scheduled Scraping Workflows', 'url': 'https://www.agentql.com/blog/scheduling', 'date': 'Dec 2, 2024', 'author': 'Rachel-Lee Nabors'}, {'title': 'Updates to Our Pricing Model', 'url': 'https://www.agentql.com/blog/2024-pricing-update', 'date': 'Nov 19, 2024', 'author': 'Rachel-Lee Nabors'}, {'title': 'Get data from any page: AgentQL’s REST API Endpoint—Launch week day 5', 'url': 'https://www.agentql.com/blog/data-rest-api', 'date': 'Nov 15, 2024', 'author': 'Rachel-Lee Nabors'}]}\")]"
+            ]
+          },
+          "execution_count": 7,
+          "metadata": {},
+          "output_type": "execute_result"
+        }
+      ],
+      "source": [
+        "pages = [doc for doc in loader.lazy_load()]\n",
+        "pages"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## API reference\n",
+        "\n",
+        "For more information on how to use this integration, please refer to the [git repo](https://github.com/tinyfish-io/agentql-integrations/tree/main/langchain) or the [langchain integration documentation](https://docs.agentql.com/integrations/langchain)"
+      ]
+    }
+  ],
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "display_name": "Python 3 (ipykernel)",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "file_extension": ".py",
+      "mimetype": "text/x-python",
+      "name": "python",
+      "nbconvert_exporter": "python",
+      "pygments_lexer": "ipython3",
+      "version": "3.11.9"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}
--- a/docs/docs/integrations/providers/agentql.mdx
+++ b/docs/docs/integrations/providers/agentql.mdx
@ -0,0 +1,35 @@
+# AgentQL
+
+[AgentQL](https://www.agentql.com/) provides web interaction and structured data extraction from any web page using an [AgentQL query](https://docs.agentql.com/agentql-query) or a Natural Language prompt. AgentQL can be used across multiple languages and web pages without breaking over time and change.
+
+## Installation and Setup
+
+Install the integration package:
+
+```bash
+pip install langchain-agentql
+```
+
+## API Key
+
+Get an API Key from our [Dev Portal](https://dev.agentql.com/) and add it to your environment variables:
+```
+export AGENTQL_API_KEY="your-api-key-here"
+```
+
+## DocumentLoader
+AgentQL's document loader provides structured data extraction from any web page using an AgentQL query.
+
+```python
+from langchain_agentql.document_loaders import AgentQLLoader
+```
+See our [document loader documentation and usage example](/docs/integrations/document_loaders/agentql).
+
+## Tools and Toolkits
+AgentQL tools provides web interaction and structured data extraction from any web page using an AgentQL query or a Natural Language prompt.
+
+```python
+from langchain_agentql.tools import ExtractWebDataTool, ExtractWebDataBrowserTool, GetWebElementBrowserTool
+from langchain_agentql import AgentQLBrowserToolkit
+```
+See our [tools documentation and usage example](/docs/integrations/tools/agentql).
--- a/docs/docs/integrations/tools/agentql.ipynb
+++ b/docs/docs/integrations/tools/agentql.ipynb
--- a/docs/scripts/tool_feat_table.py
+++ b/docs/scripts/tool_feat_table.py
@ -147,6 +147,11 @@ WEBBROWSING_TOOL_FEAT_TABLE = {
        "interactions": True,
        "pricing": "40 free requests/day",
    },
+    "AgentQL Toolkit": {
+        "link": "/docs/integrations/tools/agentql",
+        "interactions": True,
+        "pricing": "Free trial, with pay-as-you-go and flat rate plans after",
+    },
 }

 DATABASE_TOOL_FEAT_TABLE = {
--- a/docs/src/theme/FeatureTables.js
+++ b/docs/src/theme/FeatureTables.js
@ -819,6 +819,13 @@ const FEATURE_TABLES = {
                source: "Platform for running and scaling headless browsers, can be used to scrape/crawl any site",
                api: "API",
                apiLink: "https://python.langchain.com/docs/integrations/document_loaders/hyperbrowser/"
+            },
+            {
+                name: "AgentQL",
+                link: "agentql",
+                source: "Web interaction and structured data extraction from any web page using an AgentQL query or a Natural Language prompt",
+                api: "API",
+                apiLink: "https://python.langchain.com/docs/integrations/document_loaders/agentql/"
            }
        ]
    },
--- a/libs/packages.yml
+++ b/libs/packages.yml
@ -513,3 +513,6 @@ packages:
 - name: langchain-opengradient
  path: .
  repo: OpenGradient/og-langchain
+- name: langchain-agentql
+  path: langchain
+  repo: tinyfish-io/agentql-integrations