text_splitters: add request parameters for function HTMLHeaderTextSplitter.split_text… (#24178)

**Description:** The `split_text_from_url` method of `HTMLHeaderTextSplitter` does not include parameters like `timeout` when using `requests` to send a request. Therefore, I suggest adding a `kwargs` parameter to the function, which can be passed as arguments to `requests.get()` internally, allowing control over the `get` request. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-09-04 04:28:58 +00:00 · 2024-07-16 00:43:56 +08:00
parent 9d0c1d2dc9
commit d895614d19
1 changed files with 4 additions and 2 deletions
--- a/libs/text-splitters/langchain_text_splitters/html.py
+++ b/libs/text-splitters/langchain_text_splitters/html.py
@@ -71,13 +71,15 @@ class HTMLHeaderTextSplitter:
            for chunk in aggregated_chunks
        ]
-    def split_text_from_url(self, url: str) -> List[Document]:
+    def split_text_from_url(self, url: str, **kwargs: Any) -> List[Document]:
        """Split HTML from web URL
        Args:
            url: web URL
            **kwargs: Arbitrary additional keyword arguments. These are usually passed
                to the fetch url content request.
        """
-        r = requests.get(url)
+        r = requests.get(url, **kwargs)
        return self.split_text_from_file(BytesIO(r.content))
    def split_text(self, text: str) -> List[Document]: