mirror of
https://github.com/hwchase17/langchain.git
synced 2025-06-20 13:54:48 +00:00
## Description This PR adds a new `sitemap_url` parameter to the `GitbookLoader` class that allows users to specify a custom sitemap URL when loading content from a GitBook site. This is particularly useful for GitBook sites that use non-standard sitemap file names like `sitemap-pages.xml` instead of the default `sitemap.xml`. The standard `GitbookLoader` assumes that the sitemap is located at `/sitemap.xml`, but some GitBook instances (including GitBook's own documentation) use different paths for their sitemaps. This parameter makes the loader more flexible and helps users extract content from a wider range of GitBook sites. ## Issue Fixes bug [30473](https://github.com/langchain-ai/langchain/issues/30473) where the `GitbookLoader` would fail to find pages on GitBook sites that use custom sitemap URLs. ## Dependencies No new dependencies required. *I've added*: * Unit tests to verify the parameter works correctly * Integration tests to confirm the parameter is properly used with real GitBook sites * Updated docstrings with parameter documentation The changes are fully backward compatible, as the parameter is optional with a sensible default. --------- Co-authored-by: andrasfe <andrasf94@gmail.com> Co-authored-by: Eugene Yurtsev <eugene@langchain.dev> |
||
---|---|---|
.. | ||
parsers | ||
__init__.py | ||
test_arxiv.py | ||
test_astradb.py | ||
test_bigquery.py | ||
test_bilibili.py | ||
test_blockchain.py | ||
test_cassandra.py | ||
test_confluence.py | ||
test_couchbase.py | ||
test_csv_loader.py | ||
test_dataframe.py | ||
test_dedoc.py | ||
test_docusaurus.py | ||
test_duckdb.py | ||
test_email.py | ||
test_etherscan.py | ||
test_excel.py | ||
test_facebook_chat.py | ||
test_fauna.py | ||
test_figma.py | ||
test_geodataframe.py | ||
test_gitbook.py | ||
test_github.py | ||
test_google_speech_to_text.py | ||
test_ifixit.py | ||
test_joplin.py | ||
test_json_loader.py | ||
test_lakefs.py | ||
test_language.py | ||
test_larksuite.py | ||
test_llmsherpa.py | ||
test_mastodon.py | ||
test_max_compute.py | ||
test_modern_treasury.py | ||
test_news.py | ||
test_nuclia.py | ||
test_odt.py | ||
test_oracleds.py | ||
test_org_mode.py | ||
test_pdf.py | ||
test_polars_dataframe.py | ||
test_pubmed.py | ||
test_pyspark_dataframe_loader.py | ||
test_python.py | ||
test_quip.py | ||
test_recursive_url_loader.py | ||
test_rocksetdb.py | ||
test_rss.py | ||
test_rst.py | ||
test_sitemap.py | ||
test_slack.py | ||
test_spreedly.py | ||
test_sql_database.py | ||
test_stripe.py | ||
test_telegram.py | ||
test_tensorflow_datasets.py | ||
test_tidb.py | ||
test_tsv.py | ||
test_unstructured.py | ||
test_url_playwright.py | ||
test_url.py | ||
test_whatsapp_chat.py | ||
test_wikipedia.py | ||
test_xml.py | ||
test_xorbits.py |