community[minor]: add user agent for web scraping loaders (#22480)

**Description:** This PR adds a `USER_AGENT` env variable that is to be used for web scraping. It creates a util to get that user agent and uses it in the classes used for scraping in [this piece of doc](https://python.langchain.com/v0.1/docs/use_cases/web_scraping/). Identifying your scraper is considered a good politeness practice, this PR aims at easing it. **Issue:** `None` **Dependencies:** `None` **Twitter handle:** `None`
2025-09-09 23:12:38 +00:00 · 2024-06-05 17:20:34 +02:00
parent 8250c177de
commit c3d4126eb1
5 changed files with 34 additions and 6 deletions
--- a/docs/docs/integrations/document_loaders/async_chromium.ipynb
+++ b/docs/docs/integrations/document_loaders/async_chromium.ipynb
@@ -48,7 +48,7 @@
    "from langchain_community.document_loaders import AsyncChromiumLoader\n",
    "\n",
    "urls = [\"https://www.wsj.com\"]\n",
-    "loader = AsyncChromiumLoader(urls)\n",
+    "loader = AsyncChromiumLoader(urls, user_agent=\"MyAppUserAgent\")\n",
    "docs = loader.load()\n",
    "docs[0].page_content[0:100]"
   ]