mirror of
https://github.com/hwchase17/langchain.git
synced 2025-06-03 13:43:24 +00:00
**Description** This PR introduces the proxies parameter to the RecursiveUrlLoader class, allowing the user to specify proxy servers for requests. This update enables crawling through proxy servers, providing enhanced flexibility for network configurations. The key changes include: 1.Added an optional proxies parameter to the constructor (__init__). 2.Updated the documentation to explain the proxies parameter usage with an example. 3.Modified the _get_child_links_recursive method to pass the proxies parameter to the requests.get function. **Sample Usage** ```python from bs4 import BeautifulSoup as Soup from langchain_community.document_loaders.recursive_url_loader import RecursiveUrlLoader proxies = { "http": "http://localhost:1080", "https": "http://localhost:1080", } url = "https://python.langchain.com/docs/concepts/#langchain-expression-language-lcel" loader = RecursiveUrlLoader( url=url, max_depth=1, extractor=lambda x: Soup(x, "html.parser").text,proxies=proxies ) docs = loader.load() ``` --------- Co-authored-by: root <root@thb> |
||
---|---|---|
.. | ||
api_reference | ||
cassettes | ||
data | ||
docs | ||
scripts | ||
src | ||
static | ||
.gitignore | ||
.yarnrc.yml | ||
babel.config.js | ||
docusaurus.config.js | ||
ignore-step.sh | ||
Makefile | ||
package.json | ||
README.md | ||
sidebars.js | ||
vercel_requirements.txt | ||
vercel.json | ||
yarn.lock |
LangChain Documentation
For more information on contributing to our documentation, see the Documentation Contributing Guide