Commit Graph

7190 Commits

Author SHA1 Message Date
Cole Murray
43eef43550
security: Remove xslt_path and harden XML parsers in HTMLSectionSplitter: package: langchain-text-splitters (#31819)
## Summary
- Removes the `xslt_path` parameter from HTMLSectionSplitter to
eliminate XXE attack vector
- Hardens XML/HTML parsers with secure configurations to prevent XXE
attacks
- Adds comprehensive security tests to ensure the vulnerability is fixed

  ## Context
This PR addresses a critical XXE vulnerability discovered in the
HTMLSectionSplitter component. The vulnerability allowed attackers to:
- Read sensitive local files (SSH keys, passwords, configuration files)
  - Perform Server-Side Request Forgery (SSRF) attacks
  - Exfiltrate data to attacker-controlled servers

  ## Changes Made
1. **Removed `xslt_path` parameter** - This eliminates the primary
attack vector where users could supply malicious XSLT files
2. **Hardened XML parsers** - Added security configurations to prevent
XXE attacks even with the default XSLT:
     - `no_network=True` - Blocks network access
- `resolve_entities=False` - Prevents entity expansion -
`load_dtd=False` - Disables DTD processing -
`XSLTAccessControl.DENY_ALL` - Blocks all file/network I/O in XSLT
transformations

3. **Added security tests** - New test file `test_html_security.py` with
comprehensive tests for various XXE attack vectors
4. **Updated existing tests** - Modified tests that were using the
removed `xslt_path` parameter

  ## Test Plan
  - [x] All existing tests pass
  - [x] New security tests verify XXE attacks are blocked
  - [x] Code passes linting and formatting checks
  - [x] Tested with both old and new versions of lxml


Twitter handle: @_colemurray
2025-07-02 15:24:08 -04:00
Eugene Yurtsev
73fefe0295
core[path]: Use context manager for FileCallbackHandler (#31813)
Recommend using context manager for FileCallbackHandler to avoid opening
too many file descriptors

---------

Co-authored-by: Mason Daugherty <github@mdrxy.com>
2025-07-02 13:31:58 -04:00
Mason Daugherty
eb12294583
langchain-xai[patch]: Add ruff bandit rules to linter (#31816)
- Add ruff bandit rules
- Some formatting
2025-07-01 18:59:06 +00:00
Mason Daugherty
86a698d1b6
langchain-qdrant[patch]: Add ruff bandit rules to linter (#31815)
- Add ruff bandit rules
- Address a few s101s
- Some formatting
2025-07-01 18:42:55 +00:00
Mason Daugherty
b03e326231
langchain-prompty[patch]: Add ruff bandit rules to linter (#31814)
- Add ruff bandit rules
- Address some s101 assertion warnings
- Address s506 by using `yaml.safe_load()`
2025-07-01 18:32:02 +00:00
Mason Daugherty
3190c4132f
langchain-perplexity[patch]: Add ruff bandit rules to linter (#31812)
- Add ruff bandit rules
2025-07-01 18:17:28 +00:00
Mason Daugherty
f30fe07620
update pyproject.toml flake8 comment (#31810) 2025-07-01 18:16:38 +00:00
Mason Daugherty
d0dce5315f
langchain-ollama[patch]: Add ruff bandit rules to linter (#31811)
- Add ruff bandit rules
2025-07-01 18:16:07 +00:00
Mason Daugherty
c9e1ce2966
groq: release 0.3.5 (#31809) 2025-07-01 13:21:23 -04:00
Mason Daugherty
404d8408f4
langchain-nomic[patch]: Add ruff bandit rules to linter (#31805)
- Add ruff bandit rules
- Some formatting
2025-07-01 11:39:11 -04:00
Mason Daugherty
0279af60b5
langchain-mistralai[patch]: Add ruff bandit rules to linter, formatting (#31803)
- Add ruff bandit rules
- Address a s101 error
- Formatting
2025-07-01 11:08:01 -04:00
Mason Daugherty
425ee52581
langchain-huggingface[patch]: Add ruff bandit rules to linter (#31798)
- Add ruff bandit rules
2025-07-01 11:07:52 -04:00
Mason Daugherty
0efaa483e4
langchain-groq[patch]: Add ruff bandit rules to linter (#31797)
- Add ruff bandit rules
- Address s105 errors
2025-07-01 11:07:42 -04:00
Mason Daugherty
479b6fd7c5
langchain-fireworks[patch]: Add ruff bandit rules to linter (#31796)
- Add ruff bandit rules
- Address a s113 error
2025-07-01 11:07:26 -04:00
Mason Daugherty
33c9bf1adc
langchain-openai[patch]: Add ruff bandit rules to linter (#31788) 2025-06-30 14:01:32 -04:00
Mason Daugherty
645e25f624
langchain-anthropic[patch]: Add ruff bandit rules (#31789) 2025-06-30 14:00:53 -04:00
Mason Daugherty
247673ddb8
chroma: add ruff bandit rules (#31790) 2025-06-30 14:00:08 -04:00
Mason Daugherty
1a5120dc9d
langchain-deepseek[patch]: add ruff bandit rules (#31792)
add ruff bandit rules
2025-06-30 13:59:35 -04:00
Mason Daugherty
6572399174
langchain-exa: add ruff bandit rules (#31793)
Add ruff bandit rules
2025-06-30 13:58:38 -04:00
ccurme
04cc674e80
core: release 0.3.67 (#31791) 2025-06-30 12:00:39 -04:00
ccurme
46cef90f7b
core: expose tool message recognized block types (#31787) 2025-06-30 11:19:34 -04:00
Yiwei
375f53adac
IBM DB2 vector store documentation addition (#31008) 2025-06-27 18:34:32 +00:00
ccurme
9f17fabc43
openai: release 0.3.27 (#31769)
To pick up https://github.com/langchain-ai/langchain/pull/31756.
2025-06-27 13:44:45 -04:00
Andrew Jaeger
0189c50570
openai[fix]: Correctly set usage metadata for OpenAI Responses API (#31756) 2025-06-27 15:35:14 +00:00
Mason Daugherty
9aa75eaef3
docs: enhance docstring for disable_streaming parameter in BaseChatModel (#31759)
Resolves #31758
2025-06-27 11:27:41 -04:00
ccurme
e8e89b0b82
docs: updates from langchain-openai 0.3.26 (#31764) 2025-06-27 11:27:25 -04:00
Mason Daugherty
e1aff00cc1
groq: support reasoning_effort, update docs for clarity (#31754)
- There was some ambiguous wording that has been updated to hopefully
clarify the functionality of `reasoning_format` in ChatGroq.
- Added support for `reasoning_effort`
- Added links to see models capable of `reasoning_format` and
`reasoning_effort`
- Other minor nits
2025-06-27 09:43:40 -04:00
ccurme
ea1345a58b
openai[patch]: update cassette (#31752)
Following changes in `openai==1.92`.
2025-06-26 14:52:12 -04:00
ccurme
066be383e3
openai[patch]: update test following release of openai 1.92 (#31751)
Added new required fields for `ResponseFunctionWebSearch`
2025-06-26 18:22:58 +00:00
ccurme
61feaa4656
openai: release 0.3.26 (#31749) 2025-06-26 13:51:51 -04:00
ccurme
88d5f3edcc
openai[patch]: allow specification of output format for Responses API (#31686) 2025-06-26 13:41:43 -04:00
Mason Daugherty
59c2b81627
docs: fix some inline links (#31748) 2025-06-26 13:35:14 -04:00
ccurme
0ae434be21
anthropic: release 0.3.16 (#31744) 2025-06-26 09:09:29 -04:00
Martin Schaer
421554007f
docs: Update surrealdb vectorestore documentation (#31199) 2025-06-25 20:16:43 +00:00
Mason Daugherty
2fb27b63f5
ollama: update tests, docs (#31736)
- docs: for the Ollama notebooks, improve the specificity of some links,
add `homebrew` install info, update some wording
- tests: reduce number of local models needed to run in half from 4 → 2
(shedding 8gb of required installs)
- bump deps (non-breaking) in anticipation of upcoming "thinking" PR
2025-06-25 20:13:20 +00:00
ccurme
a1f3147989
docs: update sort order for integrations table (#31737)
Pull latest download statistics.
2025-06-25 16:03:36 -04:00
ccurme
84500704ab
openai[patch]: fix bug where function call IDs were not populated (#31735)
(optional) IDs were getting dropped in some cases.
2025-06-25 19:08:27 +00:00
ccurme
0bf223d6cf
openai[patch]: add attribute to always use previous_response_id (#31734) 2025-06-25 19:01:43 +00:00
ccurme
b02bd67788
anthropic[patch]: cache clients (#31659) 2025-06-25 14:49:02 -04:00
Michael Li
990a69d9d7
docs: fix typo in globals.py (#31728) 2025-06-25 11:47:02 -04:00
Mason Daugherty
3c3320ae30
fix: update import paths for ChatOllama to use langchain_ollama instead of community (#31721) 2025-06-24 16:19:31 -04:00
ccurme
e09abf8170
anthropic[patch]: add benchmark (#31718)
Account for lazy loading of clients in init time benchmark
2025-06-24 15:17:22 -04:00
Eugene Yurtsev
9164e6f906
core[patch]: Add additional hashing options to indexing API, warn on SHA-1 (#31649)
Add additional hashing options to the indexing API, warn on SHA-1

Requires:

- Bumping langchain-core version
- bumping min langchain-core in langchain

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2025-06-24 14:44:06 -04:00
Mason Daugherty
8878a7b143
docs: ollama nits (#31714) 2025-06-24 13:19:15 -04:00
Mason Daugherty
6d71b6b6ee
standard-tests: refactoring and fixes (#31703)
- `libs/core/langchain_core/messages/base.py`: add model name to
examples [per
docs](https://python.langchain.com/api_reference/standard_tests/integration_tests/langchain_tests.integration_tests.chat_models.ChatModelIntegrationTests.html#langchain_tests.integration_tests.chat_models.ChatModelIntegrationTests.test_usage_metadata)
("0.3.17: Additionally check for the presence of model_name in the
response metadata, which is needed for usage tracking in callback
handlers")
- `libs/core/langchain_core/utils/function_calling.py`: correct typo
-
`libs/standard-tests/langchain_tests/integration_tests/chat_models.py`:
- `magic_function(input)` -> `magic_function(_input)` to prevent warning
about redefining built in `input`
    - relocate a few tests for better grouping and narrative flow
    - suppress some type hint warnings following suit from similar tests
    - fix a few more typos
- validate not only that `model_name` is defined, but that it is not
empty (test_usage_metadata)
2025-06-23 23:22:31 +00:00
Christophe Bornet
c7e82ad95d
core: Use parametrized test in test_correct_get_tracer_project (#31513) 2025-06-23 18:55:57 -04:00
joshy-deshaw
8a0782c46c
openai[patch]: fix dropping response headers while streaming / Azure (#31580) 2025-06-23 17:59:58 -04:00
Mason Daugherty
8868701c16
docs: updated ChatGroq docs and example (#31710) 2025-06-23 20:36:46 +00:00
ccurme
ee83993b91
docs: document Anthropic cache TTL count details (#31708) 2025-06-23 20:16:42 +00:00
Mason Daugherty
e6191d58e7
groq: release 0.3.4 (#31709)
bump groq dependency to ensure reasoning is supported
2025-06-23 19:30:05 +00:00