mirror of
https://github.com/hwchase17/langchain.git
synced 2025-08-23 03:22:38 +00:00
docs(docs): add Oxylabs document loader (#32429)
Thank you for contributing to LangChain! Follow these steps to mark your pull request as ready for review. **If any of these steps are not completed, your PR will not be considered for review.** - [x] **PR title**: Follows the format: {TYPE}({SCOPE}): {DESCRIPTION} - Examples: - feat(core): add multi-tenant support - fix(cli): resolve flag parsing error - docs(openai): update API usage examples - Allowed `{TYPE}` values: - feat, fix, docs, style, refactor, perf, test, build, ci, chore, revert, release - Allowed `{SCOPE}` values (optional): - core, cli, langchain, standard-tests, docs, anthropic, chroma, deepseek, exa, fireworks, groq, huggingface, mistralai, nomic, ollama, openai, perplexity, prompty, qdrant, xai - Note: the `{DESCRIPTION}` must not start with an uppercase letter. - Once you've written the title, please delete this checklist item; do not include it in the PR. - [x] **PR message**: ***Delete this entire checklist*** and replace with - **Description:** a description of the change. Include a [closing keyword](https://docs.github.com/en/issues/tracking-your-work-with-issues/using-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword) if applicable to a relevant issue. - **Issue:** the issue # it fixes, if applicable (e.g. Fixes #123) - **Dependencies:** any dependencies required for this change - **Twitter handle:** if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [x] **Add tests and docs**: If you're adding a new integration, you must include: 1. A test for the integration, preferably unit tests that do not rely on network access, 2. An example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] **Lint and test**: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. **We will not consider a PR unless these three are passing in CI.** See [contribution guidelines](https://python.langchain.com/docs/contributing/) for more. Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to `pyproject.toml` files (even optional ones) unless they are **required** for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. --------- Co-authored-by: Mason Daugherty <github@mdrxy.com>
This commit is contained in:
parent
4656f727da
commit
b2b835cb36
334
docs/docs/integrations/document_loaders/oxylabs.ipynb
Normal file
334
docs/docs/integrations/document_loaders/oxylabs.ipynb
Normal file
@ -0,0 +1,334 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Oxylabs"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"[Oxylabs](https://oxylabs.io/) is a web intelligence collection platform that enables companies worldwide to unlock data-driven insights.\n",
|
||||||
|
"\n",
|
||||||
|
"## Overview\n",
|
||||||
|
"\n",
|
||||||
|
"Oxylabs document loader allows to load data from search engines, e-commerce sites, travel platforms, and any other website. It supports geolocation, browser rendering, data parsing, multiple user agents and many more parameters. Check out [Oxylabs documentation](https://developers.oxylabs.io/scraping-solutions/web-scraper-api) for more information.\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"### Integration details\n",
|
||||||
|
"\n",
|
||||||
|
"| Class | Package | Local | Serializable | Pricing |\n",
|
||||||
|
"|:--------------|:------------------------------------------------------------------|:-----:|:------------:|:-----------------------------:|\n",
|
||||||
|
"| OxylabsLoader | [langchain-oxylabs](https://github.com/oxylabs/langchain-oxylabs) | ✅ | ❌ | Free 5,000 results for 1 week |\n",
|
||||||
|
"\n",
|
||||||
|
"### Loader features\n",
|
||||||
|
"| Document Lazy Loading |\n",
|
||||||
|
"|:---------------------:|\n",
|
||||||
|
"| ✅ |\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Setup"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Install the required dependencies.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {
|
||||||
|
"scrolled": true
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"%pip install -U langchain-oxylabs"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Credentials\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Set up the proper API keys and environment variables.\n",
|
||||||
|
"Create your API user credentials: Sign up for a free trial or purchase the product\n",
|
||||||
|
"in the [Oxylabs dashboard](https://dashboard.oxylabs.io/en/registration)\n",
|
||||||
|
"to create your API user credentials (OXYLABS_USERNAME and OXYLABS_PASSWORD)."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import getpass\n",
|
||||||
|
"import os\n",
|
||||||
|
"\n",
|
||||||
|
"os.environ[\"OXYLABS_USERNAME\"] = getpass.getpass(\"Enter your Oxylabs username: \")\n",
|
||||||
|
"os.environ[\"OXYLABS_PASSWORD\"] = getpass.getpass(\"Enter your Oxylabs password: \")"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Initialization"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 14,
|
||||||
|
"metadata": {
|
||||||
|
"ExecuteTime": {
|
||||||
|
"end_time": "2025-08-06T10:57:51.630011Z",
|
||||||
|
"start_time": "2025-08-06T10:57:51.623814Z"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from langchain_oxylabs import OxylabsLoader"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 15,
|
||||||
|
"metadata": {
|
||||||
|
"ExecuteTime": {
|
||||||
|
"end_time": "2025-08-06T10:57:53.685413Z",
|
||||||
|
"start_time": "2025-08-06T10:57:53.628859Z"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"loader = OxylabsLoader(\n",
|
||||||
|
" urls=[\n",
|
||||||
|
" \"https://sandbox.oxylabs.io/products/1\",\n",
|
||||||
|
" \"https://sandbox.oxylabs.io/products/2\",\n",
|
||||||
|
" ],\n",
|
||||||
|
" params={\"markdown\": True},\n",
|
||||||
|
")"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": "## Load"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 18,
|
||||||
|
"metadata": {
|
||||||
|
"ExecuteTime": {
|
||||||
|
"end_time": "2025-08-06T10:59:51.487327Z",
|
||||||
|
"start_time": "2025-08-06T10:59:48.592743Z"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"name": "stdout",
|
||||||
|
"output_type": "stream",
|
||||||
|
"text": [
|
||||||
|
"2751\n",
|
||||||
|
"[](/)\n",
|
||||||
|
"\n",
|
||||||
|
"Game platforms:\n",
|
||||||
|
"\n",
|
||||||
|
"* **All**\n",
|
||||||
|
"\n",
|
||||||
|
"* [Nintendo platform](/products/category/nintendo)\n",
|
||||||
|
"\n",
|
||||||
|
"+ wii\n",
|
||||||
|
"+ wii-u\n",
|
||||||
|
"+ nintendo-64\n",
|
||||||
|
"+ switch\n",
|
||||||
|
"+ gamecube\n",
|
||||||
|
"+ game-boy-advance\n",
|
||||||
|
"+ 3ds\n",
|
||||||
|
"+ ds\n",
|
||||||
|
"\n",
|
||||||
|
"* [Xbox platform](/products/category/xbox-platform)\n",
|
||||||
|
"\n",
|
||||||
|
"* **Dreamcast**\n",
|
||||||
|
"\n",
|
||||||
|
"* [Playstation platform](/products/category/playstation-platform)\n",
|
||||||
|
"\n",
|
||||||
|
"* **Pc**\n",
|
||||||
|
"\n",
|
||||||
|
"* **Stadia**\n",
|
||||||
|
"\n",
|
||||||
|
"Go Back\n",
|
||||||
|
"\n",
|
||||||
|
"Note!This is a sandbox website used for web scraping. Information listed in this website does not have any real meaning and should not be associated with the actual products.\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"## The Legend of Zelda: Ocarina of Time\n",
|
||||||
|
"\n",
|
||||||
|
"**Developer:** Nintendo**Platform:****Type:** singleplayer\n",
|
||||||
|
"\n",
|
||||||
|
"As a young boy, Link is tricked by Ganondorf, the King of the Gerudo Thieves. The evil human uses Link to g\n",
|
||||||
|
"5542\n",
|
||||||
|
"[](/)\n",
|
||||||
|
"\n",
|
||||||
|
"Game platforms:\n",
|
||||||
|
"\n",
|
||||||
|
"* **All**\n",
|
||||||
|
"\n",
|
||||||
|
"* [Nintendo platform](/products/category/nintendo)\n",
|
||||||
|
"\n",
|
||||||
|
"+ wii\n",
|
||||||
|
"+ wii-u\n",
|
||||||
|
"+ nintendo-64\n",
|
||||||
|
"+ switch\n",
|
||||||
|
"+ gamecube\n",
|
||||||
|
"+ game-boy-advance\n",
|
||||||
|
"+ 3ds\n",
|
||||||
|
"+ ds\n",
|
||||||
|
"\n",
|
||||||
|
"* [Xbox platform](/products/category/xbox-platform)\n",
|
||||||
|
"\n",
|
||||||
|
"* **Dreamcast**\n",
|
||||||
|
"\n",
|
||||||
|
"* [Playstation platform](/products/category/playstation-platform)\n",
|
||||||
|
"\n",
|
||||||
|
"* **Pc**\n",
|
||||||
|
"\n",
|
||||||
|
"* **Stadia**\n",
|
||||||
|
"\n",
|
||||||
|
"Go Back\n",
|
||||||
|
"\n",
|
||||||
|
"Note!This is a sandbox website used for web scraping. Information listed in this website does not have any real meaning and should not be associated with the actual products.\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"## Super Mario Galaxy\n",
|
||||||
|
"\n",
|
||||||
|
"**Developer:** Nintendo**Platform:****Type:** singleplayer\n",
|
||||||
|
"\n",
|
||||||
|
"[Metacritic's 2007 Wii Game of the Year] The ultimate Nintendo hero is taking the ultimate step ... out into space. Join Mario as he ushers in a new era of video games, de\n"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"source": [
|
||||||
|
"for document in loader.load():\n",
|
||||||
|
" print(document.page_content[:1000])"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": "## Lazy Load"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"cell_type": "code",
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"for document in loader.lazy_load():\n",
|
||||||
|
" print(document.page_content[:1000])"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Advanced examples\n",
|
||||||
|
"\n",
|
||||||
|
"The following examples show the usage of `OxylabsLoader` with geolocation, currency, pagination and user agent parameters for Amazon Search and Google Search sources."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 21,
|
||||||
|
"metadata": {
|
||||||
|
"ExecuteTime": {
|
||||||
|
"end_time": "2025-08-06T11:04:19.901122Z",
|
||||||
|
"start_time": "2025-08-06T11:04:19.838933Z"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"loader = OxylabsLoader(\n",
|
||||||
|
" queries=[\"gaming headset\", \"gaming chair\", \"computer mouse\"],\n",
|
||||||
|
" params={\n",
|
||||||
|
" \"source\": \"amazon_search\",\n",
|
||||||
|
" \"parse\": True,\n",
|
||||||
|
" \"geo_location\": \"DE\",\n",
|
||||||
|
" \"currency\": \"EUR\",\n",
|
||||||
|
" \"pages\": 3,\n",
|
||||||
|
" },\n",
|
||||||
|
")"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 23,
|
||||||
|
"metadata": {
|
||||||
|
"ExecuteTime": {
|
||||||
|
"end_time": "2025-08-06T11:07:17.648142Z",
|
||||||
|
"start_time": "2025-08-06T11:07:17.595629Z"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"loader = OxylabsLoader(\n",
|
||||||
|
" queries=[\"europe gdp per capita\", \"us gdp per capita\"],\n",
|
||||||
|
" params={\n",
|
||||||
|
" \"source\": \"google_search\",\n",
|
||||||
|
" \"parse\": True,\n",
|
||||||
|
" \"geo_location\": \"Paris, France\",\n",
|
||||||
|
" \"user_agent_type\": \"mobile\",\n",
|
||||||
|
" },\n",
|
||||||
|
")"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## API reference\n",
|
||||||
|
"\n",
|
||||||
|
"[More information about this package.](https://github.com/oxylabs/langchain-oxylabs)"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3 (ipykernel)",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python3"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.11.9"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 2
|
||||||
|
}
|
@ -856,6 +856,13 @@ const FEATURE_TABLES = {
|
|||||||
source: "Web interaction and structured data extraction from any web page using an AgentQL query or a Natural Language prompt",
|
source: "Web interaction and structured data extraction from any web page using an AgentQL query or a Natural Language prompt",
|
||||||
api: "API",
|
api: "API",
|
||||||
apiLink: "https://python.langchain.com/docs/integrations/document_loaders/agentql/"
|
apiLink: "https://python.langchain.com/docs/integrations/document_loaders/agentql/"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "Oxylabs",
|
||||||
|
link: "oxylabs",
|
||||||
|
source: "Web intelligence platform enabling the access to various data sources.",
|
||||||
|
api: "API",
|
||||||
|
apiLink: "https://github.com/oxylabs/langchain-oxylabs"
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
Loading…
Reference in New Issue
Block a user