langchain/libs/community/tests/unit_tests
Andras L Ferenczi 64df60e690
community[minor]: Add custom sitemap URL parameter to GitbookLoader (#30549)
## Description
This PR adds a new `sitemap_url` parameter to the `GitbookLoader` class
that allows users to specify a custom sitemap URL when loading content
from a GitBook site. This is particularly useful for GitBook sites that
use non-standard sitemap file names like `sitemap-pages.xml` instead of
the default `sitemap.xml`.
The standard `GitbookLoader` assumes that the sitemap is located at
`/sitemap.xml`, but some GitBook instances (including GitBook's own
documentation) use different paths for their sitemaps. This parameter
makes the loader more flexible and helps users extract content from a
wider range of GitBook sites.
## Issue
Fixes bug
[30473](https://github.com/langchain-ai/langchain/issues/30473) where
the `GitbookLoader` would fail to find pages on GitBook sites that use
custom sitemap URLs.
## Dependencies
No new dependencies required.
*I've added*:
* Unit tests to verify the parameter works correctly
* Integration tests to confirm the parameter is properly used with real
GitBook sites
* Updated docstrings with parameter documentation
The changes are fully backward compatible, as the parameter is optional
with a sensible default.

---------

Co-authored-by: andrasfe <andrasf94@gmail.com>
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
2025-04-01 16:17:21 +00:00
..
agent_toolkits community: adds support for getting github releases for the configured repository (#29318) 2025-01-22 15:45:52 +00:00
agents community: add truncation params when an openai assistant's run is created (#28158) 2024-11-27 10:53:53 -05:00
callbacks Community : Add OpenAI prompt caching and reasoning tokens tracking (#27135) 2024-12-19 09:31:13 -05:00
chains core: add kwargs support to VectorStore (#25934) 2024-12-16 18:57:57 +00:00
chat_loaders community: Bump ruff version to 0.9 (#29206) 2025-02-08 01:21:10 +00:00
chat_message_histories community: Bump ruff version to 0.9 (#29206) 2025-02-08 01:21:10 +00:00
chat_models community: fix perplexity response parameters not being included in model response (#30440) 2025-03-26 22:28:08 -04:00
cross_encoders community[patch]: cross_encoders flatten namespaces (#20183) 2024-04-08 20:50:23 -04:00
data community: Resolve refs recursively when generating openai_fn from OpenAPI spec (#19002) 2024-09-02 13:17:39 -07:00
docstore infra: update mypy 1.10, ruff 0.5 (#23721) 2024-07-03 10:33:27 -07:00
document_compressors community: add InfinityRerank (#27043) 2024-11-06 17:26:30 -08:00
document_loaders community[minor]: Add custom sitemap URL parameter to GitbookLoader (#30549) 2025-04-01 16:17:21 +00:00
document_transformers community: Bump ruff version to 0.9 (#29206) 2025-02-08 01:21:10 +00:00
embeddings community: Repair embeddings/llamacpp's embed_query method (#29935) 2025-02-23 19:32:17 +00:00
evaluation infra: update mypy 1.10, ruff 0.5 (#23721) 2024-07-03 10:33:27 -07:00
examples community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
graph_vectorstores [community]: Render documents to graphviz (#24830) 2024-12-14 02:02:09 +00:00
graphs community: fix issue #29429 in age_graph.py (#29506) 2025-02-01 21:24:45 -05:00
imports community[minor]: Adds a vector store for Azure Cosmos DB for NoSQL (#21676) 2024-06-11 10:34:01 -07:00
indexes community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
jira community: fix Jira API wrapper failing initialization with cloud param (#30117) 2025-03-05 10:49:25 -05:00
llms infra: migrate to uv (#29566) 2025-02-06 13:36:26 -05:00
load core: add sambanova chat models to load module mapping (#29855) 2025-02-20 12:30:50 -05:00
query_constructors community: Bump ruff version to 0.9 (#29206) 2025-02-08 01:21:10 +00:00
retrievers community: add top_k as param to Needle Retriever (#29821) 2025-02-16 08:30:52 -05:00
storage community: Bump ruff version to 0.9 (#29206) 2025-02-08 01:21:10 +00:00
tools (Community): Added API Key for Jina Search API Wrapper (#29622) 2025-02-12 20:12:07 -08:00
utilities community[minor]: Improve Brave Search Tool, allow api key in env var (#30364) 2025-03-31 14:48:52 -04:00
utils partners: Use simsimd types (#25299) 2024-08-23 10:41:39 -04:00
vectorstores community: Bump ruff version to 0.9 (#29206) 2025-02-08 01:21:10 +00:00
__init__.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
conftest.py community: Use Blockbuster to detect blocking calls in asyncio during tests (#29609) 2025-02-08 01:10:39 +00:00
test_cache.py community: Use Blockbuster to detect blocking calls in asyncio during tests (#29609) 2025-02-08 01:10:39 +00:00
test_dependencies.py community: Use Blockbuster to detect blocking calls in asyncio during tests (#29609) 2025-02-08 01:10:39 +00:00
test_document_transformers.py infra: update mypy 1.10, ruff 0.5 (#23721) 2024-07-03 10:33:27 -07:00
test_graph_vectorstores.py core, community: move graph vectorstores to community (#26678) 2024-09-19 11:38:14 -07:00
test_imports.py community: Bump ruff version to 0.9 (#29206) 2025-02-08 01:21:10 +00:00
test_sql_database_schema.py infra: add min version testing to pr test flow (#24358) 2024-07-19 22:03:19 +00:00
test_sql_database.py community: add HANA dialect to SQLDatabase (#30475) 2025-03-27 15:19:50 -04:00
test_sqlalchemy.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00