mirror of
https://github.com/hwchase17/langchain.git
synced 2025-09-28 15:00:23 +00:00
## Summary - Removes the `xslt_path` parameter from HTMLSectionSplitter to eliminate XXE attack vector - Hardens XML/HTML parsers with secure configurations to prevent XXE attacks - Adds comprehensive security tests to ensure the vulnerability is fixed ## Context This PR addresses a critical XXE vulnerability discovered in the HTMLSectionSplitter component. The vulnerability allowed attackers to: - Read sensitive local files (SSH keys, passwords, configuration files) - Perform Server-Side Request Forgery (SSRF) attacks - Exfiltrate data to attacker-controlled servers ## Changes Made 1. **Removed `xslt_path` parameter** - This eliminates the primary attack vector where users could supply malicious XSLT files 2. **Hardened XML parsers** - Added security configurations to prevent XXE attacks even with the default XSLT: - `no_network=True` - Blocks network access - `resolve_entities=False` - Prevents entity expansion - `load_dtd=False` - Disables DTD processing - `XSLTAccessControl.DENY_ALL` - Blocks all file/network I/O in XSLT transformations 3. **Added security tests** - New test file `test_html_security.py` with comprehensive tests for various XXE attack vectors 4. **Updated existing tests** - Modified tests that were using the removed `xslt_path` parameter ## Test Plan - [x] All existing tests pass - [x] New security tests verify XXE attacks are blocked - [x] Code passes linting and formatting checks - [x] Tested with both old and new versions of lxml Twitter handle: @_colemurray
🦜✂️ LangChain Text Splitters
Quick Install
pip install langchain-text-splitters
What is it?
LangChain Text Splitters contains utilities for splitting into chunks a wide variety of text documents.
For full documentation see the API reference and the Text Splitters module in the main docs.
📕 Releases & Versioning
langchain-text-splitters
is currently on version 0.0.x
.
Minor version increases will occur for:
- Breaking changes for any public interfaces NOT marked
beta
Patch version increases will occur for:
- Bug fixes
- New features
- Any changes to private interfaces
- Any changes to
beta
features
💁 Contributing
As an open-source project in a rapidly developing field, we are extremely open to contributions, whether it be in the form of a new feature, improved infrastructure, or better documentation.
For detailed information on how to contribute, see the Contributing Guide.