Building applications with LLMs through composability
Go to file
Cole Murray 43eef43550
security: Remove xslt_path and harden XML parsers in HTMLSectionSplitter: package: langchain-text-splitters (#31819)
## Summary
- Removes the `xslt_path` parameter from HTMLSectionSplitter to
eliminate XXE attack vector
- Hardens XML/HTML parsers with secure configurations to prevent XXE
attacks
- Adds comprehensive security tests to ensure the vulnerability is fixed

  ## Context
This PR addresses a critical XXE vulnerability discovered in the
HTMLSectionSplitter component. The vulnerability allowed attackers to:
- Read sensitive local files (SSH keys, passwords, configuration files)
  - Perform Server-Side Request Forgery (SSRF) attacks
  - Exfiltrate data to attacker-controlled servers

  ## Changes Made
1. **Removed `xslt_path` parameter** - This eliminates the primary
attack vector where users could supply malicious XSLT files
2. **Hardened XML parsers** - Added security configurations to prevent
XXE attacks even with the default XSLT:
     - `no_network=True` - Blocks network access
- `resolve_entities=False` - Prevents entity expansion -
`load_dtd=False` - Disables DTD processing -
`XSLTAccessControl.DENY_ALL` - Blocks all file/network I/O in XSLT
transformations

3. **Added security tests** - New test file `test_html_security.py` with
comprehensive tests for various XXE attack vectors
4. **Updated existing tests** - Modified tests that were using the
removed `xslt_path` parameter

  ## Test Plan
  - [x] All existing tests pass
  - [x] New security tests verify XXE attacks are blocked
  - [x] Code passes linting and formatting checks
  - [x] Tested with both old and new versions of lxml


Twitter handle: @_colemurray
2025-07-02 15:24:08 -04:00
.devcontainer community[minor]: Add ApertureDB as a vectorstore (#24088) 2024-07-16 09:32:59 -07:00
.github Revert "infra: temporarily drop OpenAI from core release test matrix" (#31694) 2025-06-20 22:12:38 +00:00
cookbook fix: update import paths for ChatOllama to use langchain_ollama instead of community (#31721) 2025-06-24 16:19:31 -04:00
docs docs: Add PR info doc (#31833) 2025-07-02 19:20:27 +00:00
libs security: Remove xslt_path and harden XML parsers in HTMLSectionSplitter: package: langchain-text-splitters (#31819) 2025-07-02 15:24:08 -04:00
scripts
.gitattributes
.gitignore [performance]: Adding benchmarks for common langchain-core imports (#30747) 2025-04-09 13:00:15 -04:00
.pre-commit-config.yaml voyageai: remove from monorepo (#31281) 2025-05-19 16:33:38 +00:00
.readthedocs.yaml docs(readthedocs): streamline config (#30307) 2025-03-18 11:47:45 -04:00
CITATION.cff
LICENSE
Makefile infra: Suppress error in make api_docs_clean if index.md is missing (#31129) 2025-05-11 17:26:49 -04:00
MIGRATE.md Proofreading and Editing Report for Migration Guide (#28084) 2024-11-13 11:03:09 -05:00
poetry.toml multiple: use modern installer in poetry (#23998) 2024-07-08 18:50:48 -07:00
pyproject.toml docs: update agents tutorial to use langchain-tavily (#31637) 2025-06-17 11:25:03 -04:00
README.md docs: fix Langgraph Platform URL in Readme file (#31341) 2025-05-26 14:59:48 -04:00
SECURITY.md fix: typo in SECURITY.md (practicies -> practices) (#31509) 2025-06-06 08:42:01 -04:00
uv.lock docs: update agents tutorial to use langchain-tavily (#31637) 2025-06-17 11:25:03 -04:00
yarn.lock box: add langchain box package and DocumentLoader (#25506) 2024-08-21 02:23:43 +00:00

LangChain Logo

Release Notes CI PyPI - License PyPI - Downloads GitHub star chart Open Issues Open in Dev Containers Twitter CodSpeed Badge

Note

Looking for the JS/TS library? Check out LangChain.js.

LangChain is a framework for building LLM-powered applications. It helps you chain together interoperable components and third-party integrations to simplify AI application development — all while future-proofing decisions as the underlying technology evolves.

pip install -U langchain

To learn more about LangChain, check out the docs. If youre looking for more advanced customization or agent orchestration, check out LangGraph, our framework for building controllable agent workflows.

Why use LangChain?

LangChain helps developers build applications powered by LLMs through a standard interface for models, embeddings, vector stores, and more.

Use LangChain for:

  • Real-time data augmentation. Easily connect LLMs to diverse data sources and external / internal systems, drawing from LangChains vast library of integrations with model providers, tools, vector stores, retrievers, and more.
  • Model interoperability. Swap models in and out as your engineering team experiments to find the best choice for your applications needs. As the industry frontier evolves, adapt quickly — LangChains abstractions keep you moving without losing momentum.

LangChains ecosystem

While the LangChain framework can be used standalone, it also integrates seamlessly with any LangChain product, giving developers a full suite of tools when building LLM applications.

To improve your LLM application development, pair LangChain with:

  • LangSmith - Helpful for agent evals and observability. Debug poor-performing LLM app runs, evaluate agent trajectories, gain visibility in production, and improve performance over time.
  • LangGraph - Build agents that can reliably handle complex tasks with LangGraph, our low-level agent orchestration framework. LangGraph offers customizable architecture, long-term memory, and human-in-the-loop workflows — and is trusted in production by companies like LinkedIn, Uber, Klarna, and GitLab.
  • LangGraph Platform - Deploy and scale agents effortlessly with a purpose-built deployment platform for long running, stateful workflows. Discover, reuse, configure, and share agents across teams — and iterate quickly with visual prototyping in LangGraph Studio.

Additional resources

  • Tutorials: Simple walkthroughs with guided examples on getting started with LangChain.
  • How-to Guides: Quick, actionable code snippets for topics such as tool calling, RAG use cases, and more.
  • Conceptual Guides: Explanations of key concepts behind the LangChain framework.
  • API Reference: Detailed reference on navigating base packages and integrations for LangChain.