mirror of
https://github.com/hwchase17/langchain.git
synced 2025-07-16 17:53:37 +00:00
docs: Add AgentQL provider doc, tool/toolkit doc and documentloader doc (#30144)
- **Description:** Added AgentQL docs for the provider page, tools page and documentloader page - **Twitter handle:** @AgentQL Repo: https://github.com/tinyfish-io/agentql-integrations/tree/main/langchain PyPI: https://pypi.org/project/langchain-agentql/ If no one reviews your PR within a few days, please @-mention one of baskaryan, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>
This commit is contained in:
parent
23fa70f328
commit
49bdd3b6fe
265
docs/docs/integrations/document_loaders/agentql.ipynb
Normal file
265
docs/docs/integrations/document_loaders/agentql.ipynb
Normal file
@ -0,0 +1,265 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "wkUAAcGZNSJ3"
|
||||
},
|
||||
"source": [
|
||||
"# AgentQLLoader\n",
|
||||
"\n",
|
||||
"[AgentQL](https://www.agentql.com/)'s document loader provides structured data extraction from any web page using an [AgentQL query](https://docs.agentql.com/agentql-query). AgentQL can be used across multiple languages and web pages without breaking over time and change.\n",
|
||||
"\n",
|
||||
"## Overview\n",
|
||||
"\n",
|
||||
"`AgentQLLoader` requires the following two parameters:\n",
|
||||
"- `url`: The URL of the web page you want to extract data from.\n",
|
||||
"- `query`: The AgentQL query to execute. Learn more about [how to write an AgentQL query in the docs](https://docs.agentql.com/agentql-query) or test one out in the [AgentQL Playground](https://dev.agentql.com/playground).\n",
|
||||
"\n",
|
||||
"Setting the following parameters are optional:\n",
|
||||
"- `api_key`: Your AgentQL API key from [dev.agentql.com](https://dev.agentql.com). **`Optional`.**\n",
|
||||
"- `timeout`: The number of seconds to wait for a request before timing out. **Defaults to `900`.**\n",
|
||||
"- `is_stealth_mode_enabled`: Whether to enable experimental anti-bot evasion strategies. This feature may not work for all websites at all times. Data extraction may take longer to complete with this mode enabled. **Defaults to `False`.**\n",
|
||||
"- `wait_for`: The number of seconds to wait for the page to load before extracting data. **Defaults to `0`.**\n",
|
||||
"- `is_scroll_to_bottom_enabled`: Whether to scroll to bottom of the page before extracting data. **Defaults to `False`.**\n",
|
||||
"- `mode`: `\"standard\"` uses deep data analysis, while `\"fast\"` trades some depth of analysis for speed and is adequate for most usecases. [Learn more about the modes in this guide.](https://docs.agentql.com/accuracy/standard-mode) **Defaults to `\"fast\"`.**\n",
|
||||
"- `is_screenshot_enabled`: Whether to take a screenshot before extracting data. Returned in 'metadata' as a Base64 string. **Defaults to `False`.**\n",
|
||||
"\n",
|
||||
"AgentQLLoader is implemented with AgentQL's [REST API](https://docs.agentql.com/rest-api/api-reference)\n",
|
||||
"\n",
|
||||
"### Integration details\n",
|
||||
"\n",
|
||||
"| Class | Package | Local | Serializable | JS support |\n",
|
||||
"| :--- | :--- | :---: | :---: | :---: |\n",
|
||||
"| AgentQLLoader| langchain-agentql | ✅ | ❌ | ❌ |\n",
|
||||
"\n",
|
||||
"### Loader features\n",
|
||||
"| Source | Document Lazy Loading | Native Async Support\n",
|
||||
"| :---: | :---: | :---: |\n",
|
||||
"| AgentQLLoader | ✅ | ❌ |"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "CaKa2QrnwPXq"
|
||||
},
|
||||
"source": [
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"To use the AgentQL Document Loader, you will need to configure the `AGENTQL_API_KEY` environment variable, or use the `api_key` parameter. You can acquire an API key from our [Dev Portal](https://dev.agentql.com)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "mZNJvUQBNSJ5"
|
||||
},
|
||||
"source": [
|
||||
"### Installation\n",
|
||||
"\n",
|
||||
"Install **langchain-agentql**."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "IblRoJJDNSJ5"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%pip install -qU langchain_agentql"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "SNsUT60YvfCm"
|
||||
},
|
||||
"source": [
|
||||
"### Set Credentials"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {
|
||||
"id": "2D1EN7Egvk1c"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\"AGENTQL_API_KEY\"] = \"YOUR_AGENTQL_API_KEY\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "D4hnJV_6NSJ5"
|
||||
},
|
||||
"source": [
|
||||
"## Initialization\n",
|
||||
"\n",
|
||||
"Next instantiate your model object:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {
|
||||
"id": "oMJdxL_KNSJ5"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_agentql.document_loaders import AgentQLLoader\n",
|
||||
"\n",
|
||||
"loader = AgentQLLoader(\n",
|
||||
" url=\"https://www.agentql.com/blog\",\n",
|
||||
" query=\"\"\"\n",
|
||||
" {\n",
|
||||
" posts[] {\n",
|
||||
" title\n",
|
||||
" url\n",
|
||||
" date\n",
|
||||
" author\n",
|
||||
" }\n",
|
||||
" }\n",
|
||||
" \"\"\",\n",
|
||||
" is_scroll_to_bottom_enabled=True,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "SRxIOx90NSJ5"
|
||||
},
|
||||
"source": [
|
||||
"## Load"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"base_uri": "https://localhost:8080/"
|
||||
},
|
||||
"id": "bNnnCZ1oNSJ5",
|
||||
"outputId": "d0eb8cb4-9742-4f0c-80f1-0509a3af1808"
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"Document(metadata={'request_id': 'bdb9dbe7-8a7f-427f-bc16-839ccc02cae6', 'generated_query': None, 'screenshot': None}, page_content=\"{'posts': [{'title': 'Launch Week Recap—make the web AI-ready', 'url': 'https://www.agentql.com/blog/2024-launch-week-recap', 'date': 'Nov 18, 2024', 'author': 'Rachel-Lee Nabors'}, {'title': 'Accurate data extraction from PDFs and images with AgentQL', 'url': 'https://www.agentql.com/blog/accurate-data-extraction-pdfs-images', 'date': 'Feb 1, 2025', 'author': 'Rachel-Lee Nabors'}, {'title': 'Introducing Scheduled Scraping Workflows', 'url': 'https://www.agentql.com/blog/scheduling', 'date': 'Dec 2, 2024', 'author': 'Rachel-Lee Nabors'}, {'title': 'Updates to Our Pricing Model', 'url': 'https://www.agentql.com/blog/2024-pricing-update', 'date': 'Nov 19, 2024', 'author': 'Rachel-Lee Nabors'}, {'title': 'Get data from any page: AgentQL’s REST API Endpoint—Launch week day 5', 'url': 'https://www.agentql.com/blog/data-rest-api', 'date': 'Nov 15, 2024', 'author': 'Rachel-Lee Nabors'}]}\")"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"docs = loader.load()\n",
|
||||
"docs[0]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"base_uri": "https://localhost:8080/"
|
||||
},
|
||||
"id": "wtPMNh72NSJ5",
|
||||
"outputId": "59d529a4-3c22-445c-f5cf-dc7b24168906"
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"{'request_id': 'bdb9dbe7-8a7f-427f-bc16-839ccc02cae6', 'generated_query': None, 'screenshot': None}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(docs[0].metadata)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "7RMuEwl4NSJ5"
|
||||
},
|
||||
"source": [
|
||||
"## Lazy Load\n",
|
||||
"\n",
|
||||
"`AgentQLLoader` currently only loads one `Document` at a time. Therefore, `load()` and `lazy_load()` behave the same:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"base_uri": "https://localhost:8080/"
|
||||
},
|
||||
"id": "FIYddZBONSJ5",
|
||||
"outputId": "c39a7a6d-bc52-4ef9-b36f-e1d138590b79"
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(metadata={'request_id': '06273abd-b2ef-4e15-b0ec-901cba7b4825', 'generated_query': None, 'screenshot': None}, page_content=\"{'posts': [{'title': 'Launch Week Recap—make the web AI-ready', 'url': 'https://www.agentql.com/blog/2024-launch-week-recap', 'date': 'Nov 18, 2024', 'author': 'Rachel-Lee Nabors'}, {'title': 'Accurate data extraction from PDFs and images with AgentQL', 'url': 'https://www.agentql.com/blog/accurate-data-extraction-pdfs-images', 'date': 'Feb 1, 2025', 'author': 'Rachel-Lee Nabors'}, {'title': 'Introducing Scheduled Scraping Workflows', 'url': 'https://www.agentql.com/blog/scheduling', 'date': 'Dec 2, 2024', 'author': 'Rachel-Lee Nabors'}, {'title': 'Updates to Our Pricing Model', 'url': 'https://www.agentql.com/blog/2024-pricing-update', 'date': 'Nov 19, 2024', 'author': 'Rachel-Lee Nabors'}, {'title': 'Get data from any page: AgentQL’s REST API Endpoint—Launch week day 5', 'url': 'https://www.agentql.com/blog/data-rest-api', 'date': 'Nov 15, 2024', 'author': 'Rachel-Lee Nabors'}]}\")]"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"pages = [doc for doc in loader.lazy_load()]\n",
|
||||
"pages"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## API reference\n",
|
||||
"\n",
|
||||
"For more information on how to use this integration, please refer to the [git repo](https://github.com/tinyfish-io/agentql-integrations/tree/main/langchain) or the [langchain integration documentation](https://docs.agentql.com/integrations/langchain)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"provenance": []
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
}
|
35
docs/docs/integrations/providers/agentql.mdx
Normal file
35
docs/docs/integrations/providers/agentql.mdx
Normal file
@ -0,0 +1,35 @@
|
||||
# AgentQL
|
||||
|
||||
[AgentQL](https://www.agentql.com/) provides web interaction and structured data extraction from any web page using an [AgentQL query](https://docs.agentql.com/agentql-query) or a Natural Language prompt. AgentQL can be used across multiple languages and web pages without breaking over time and change.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
Install the integration package:
|
||||
|
||||
```bash
|
||||
pip install langchain-agentql
|
||||
```
|
||||
|
||||
## API Key
|
||||
|
||||
Get an API Key from our [Dev Portal](https://dev.agentql.com/) and add it to your environment variables:
|
||||
```
|
||||
export AGENTQL_API_KEY="your-api-key-here"
|
||||
```
|
||||
|
||||
## DocumentLoader
|
||||
AgentQL's document loader provides structured data extraction from any web page using an AgentQL query.
|
||||
|
||||
```python
|
||||
from langchain_agentql.document_loaders import AgentQLLoader
|
||||
```
|
||||
See our [document loader documentation and usage example](/docs/integrations/document_loaders/agentql).
|
||||
|
||||
## Tools and Toolkits
|
||||
AgentQL tools provides web interaction and structured data extraction from any web page using an AgentQL query or a Natural Language prompt.
|
||||
|
||||
```python
|
||||
from langchain_agentql.tools import ExtractWebDataTool, ExtractWebDataBrowserTool, GetWebElementBrowserTool
|
||||
from langchain_agentql import AgentQLBrowserToolkit
|
||||
```
|
||||
See our [tools documentation and usage example](/docs/integrations/tools/agentql).
|
1077
docs/docs/integrations/tools/agentql.ipynb
Normal file
1077
docs/docs/integrations/tools/agentql.ipynb
Normal file
File diff suppressed because it is too large
Load Diff
@ -147,6 +147,11 @@ WEBBROWSING_TOOL_FEAT_TABLE = {
|
||||
"interactions": True,
|
||||
"pricing": "40 free requests/day",
|
||||
},
|
||||
"AgentQL Toolkit": {
|
||||
"link": "/docs/integrations/tools/agentql",
|
||||
"interactions": True,
|
||||
"pricing": "Free trial, with pay-as-you-go and flat rate plans after",
|
||||
},
|
||||
}
|
||||
|
||||
DATABASE_TOOL_FEAT_TABLE = {
|
||||
|
@ -819,6 +819,13 @@ const FEATURE_TABLES = {
|
||||
source: "Platform for running and scaling headless browsers, can be used to scrape/crawl any site",
|
||||
api: "API",
|
||||
apiLink: "https://python.langchain.com/docs/integrations/document_loaders/hyperbrowser/"
|
||||
},
|
||||
{
|
||||
name: "AgentQL",
|
||||
link: "agentql",
|
||||
source: "Web interaction and structured data extraction from any web page using an AgentQL query or a Natural Language prompt",
|
||||
api: "API",
|
||||
apiLink: "https://python.langchain.com/docs/integrations/document_loaders/agentql/"
|
||||
}
|
||||
]
|
||||
},
|
||||
|
@ -513,3 +513,6 @@ packages:
|
||||
- name: langchain-opengradient
|
||||
path: .
|
||||
repo: OpenGradient/og-langchain
|
||||
- name: langchain-agentql
|
||||
path: langchain
|
||||
repo: tinyfish-io/agentql-integrations
|
||||
|
Loading…
Reference in New Issue
Block a user