Commit Graph

13650 Commits

Author SHA1 Message Date
Cole Murray
43eef43550
security: Remove xslt_path and harden XML parsers in HTMLSectionSplitter: package: langchain-text-splitters (#31819)
## Summary
- Removes the `xslt_path` parameter from HTMLSectionSplitter to
eliminate XXE attack vector
- Hardens XML/HTML parsers with secure configurations to prevent XXE
attacks
- Adds comprehensive security tests to ensure the vulnerability is fixed

  ## Context
This PR addresses a critical XXE vulnerability discovered in the
HTMLSectionSplitter component. The vulnerability allowed attackers to:
- Read sensitive local files (SSH keys, passwords, configuration files)
  - Perform Server-Side Request Forgery (SSRF) attacks
  - Exfiltrate data to attacker-controlled servers

  ## Changes Made
1. **Removed `xslt_path` parameter** - This eliminates the primary
attack vector where users could supply malicious XSLT files
2. **Hardened XML parsers** - Added security configurations to prevent
XXE attacks even with the default XSLT:
     - `no_network=True` - Blocks network access
- `resolve_entities=False` - Prevents entity expansion -
`load_dtd=False` - Disables DTD processing -
`XSLTAccessControl.DENY_ALL` - Blocks all file/network I/O in XSLT
transformations

3. **Added security tests** - New test file `test_html_security.py` with
comprehensive tests for various XXE attack vectors
4. **Updated existing tests** - Modified tests that were using the
removed `xslt_path` parameter

  ## Test Plan
  - [x] All existing tests pass
  - [x] New security tests verify XXE attacks are blocked
  - [x] Code passes linting and formatting checks
  - [x] Tested with both old and new versions of lxml


Twitter handle: @_colemurray
2025-07-02 15:24:08 -04:00
Mason Daugherty
815d11ed6a
docs: Add PR info doc (#31833) 2025-07-02 19:20:27 +00:00
Eugene Yurtsev
73fefe0295
core[path]: Use context manager for FileCallbackHandler (#31813)
Recommend using context manager for FileCallbackHandler to avoid opening
too many file descriptors

---------

Co-authored-by: Mason Daugherty <github@mdrxy.com>
2025-07-02 13:31:58 -04:00
ojumah20
377e5f5204
docs: Update agents.ipynb (#31820) 2025-07-02 10:42:28 -04:00
Mason Daugherty
eb12294583
langchain-xai[patch]: Add ruff bandit rules to linter (#31816)
- Add ruff bandit rules
- Some formatting
2025-07-01 18:59:06 +00:00
Mason Daugherty
86a698d1b6
langchain-qdrant[patch]: Add ruff bandit rules to linter (#31815)
- Add ruff bandit rules
- Address a few s101s
- Some formatting
2025-07-01 18:42:55 +00:00
Mason Daugherty
b03e326231
langchain-prompty[patch]: Add ruff bandit rules to linter (#31814)
- Add ruff bandit rules
- Address some s101 assertion warnings
- Address s506 by using `yaml.safe_load()`
2025-07-01 18:32:02 +00:00
Mason Daugherty
3190c4132f
langchain-perplexity[patch]: Add ruff bandit rules to linter (#31812)
- Add ruff bandit rules
2025-07-01 18:17:28 +00:00
Mason Daugherty
f30fe07620
update pyproject.toml flake8 comment (#31810) 2025-07-01 18:16:38 +00:00
Mason Daugherty
d0dce5315f
langchain-ollama[patch]: Add ruff bandit rules to linter (#31811)
- Add ruff bandit rules
2025-07-01 18:16:07 +00:00
Mason Daugherty
c9e1ce2966
groq: release 0.3.5 (#31809) 2025-07-01 13:21:23 -04:00
Mason Daugherty
404d8408f4
langchain-nomic[patch]: Add ruff bandit rules to linter (#31805)
- Add ruff bandit rules
- Some formatting
2025-07-01 11:39:11 -04:00
Mason Daugherty
0279af60b5
langchain-mistralai[patch]: Add ruff bandit rules to linter, formatting (#31803)
- Add ruff bandit rules
- Address a s101 error
- Formatting
2025-07-01 11:08:01 -04:00
Mason Daugherty
425ee52581
langchain-huggingface[patch]: Add ruff bandit rules to linter (#31798)
- Add ruff bandit rules
2025-07-01 11:07:52 -04:00
Mason Daugherty
0efaa483e4
langchain-groq[patch]: Add ruff bandit rules to linter (#31797)
- Add ruff bandit rules
- Address s105 errors
2025-07-01 11:07:42 -04:00
Mason Daugherty
479b6fd7c5
langchain-fireworks[patch]: Add ruff bandit rules to linter (#31796)
- Add ruff bandit rules
- Address a s113 error
2025-07-01 11:07:26 -04:00
Lauren Hirata Singh
625f7c3710
docs: Add forum link to footer (#31795)
Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
  - Example: "core: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
2025-06-30 16:14:03 -04:00
Anush
2d3020f6cd
docs: Update vectorstores feature matrix for Qdrant (#31786)
## Description

- `Qdrant` vector store supports `add_documents` with IDs.
- Multi-tenancy is supported via [payload
filters](https://qdrant.tech/documentation/guides/multiple-partitions/)
and
[JWT](https://qdrant.tech/documentation/guides/security/#granular-access-control-with-jwt)
if needed.
2025-06-30 14:02:07 -04:00
Mason Daugherty
33c9bf1adc
langchain-openai[patch]: Add ruff bandit rules to linter (#31788) 2025-06-30 14:01:32 -04:00
Mason Daugherty
645e25f624
langchain-anthropic[patch]: Add ruff bandit rules (#31789) 2025-06-30 14:00:53 -04:00
Mason Daugherty
247673ddb8
chroma: add ruff bandit rules (#31790) 2025-06-30 14:00:08 -04:00
Mason Daugherty
1a5120dc9d
langchain-deepseek[patch]: add ruff bandit rules (#31792)
add ruff bandit rules
2025-06-30 13:59:35 -04:00
Mason Daugherty
6572399174
langchain-exa: add ruff bandit rules (#31793)
Add ruff bandit rules
2025-06-30 13:58:38 -04:00
ccurme
04cc674e80
core: release 0.3.67 (#31791) 2025-06-30 12:00:39 -04:00
ccurme
46cef90f7b
core: expose tool message recognized block types (#31787) 2025-06-30 11:19:34 -04:00
ccurme
428c276948
infra: skip notebook in CI (#31773) 2025-06-28 14:00:45 -04:00
Yiwei
375f53adac
IBM DB2 vector store documentation addition (#31008) 2025-06-27 18:34:32 +00:00
ccurme
9f17fabc43
openai: release 0.3.27 (#31769)
To pick up https://github.com/langchain-ai/langchain/pull/31756.
2025-06-27 13:44:45 -04:00
Andrew Jaeger
0189c50570
openai[fix]: Correctly set usage metadata for OpenAI Responses API (#31756) 2025-06-27 15:35:14 +00:00
Mason Daugherty
9aa75eaef3
docs: enhance docstring for disable_streaming parameter in BaseChatModel (#31759)
Resolves #31758
2025-06-27 11:27:41 -04:00
ccurme
e8e89b0b82
docs: updates from langchain-openai 0.3.26 (#31764) 2025-06-27 11:27:25 -04:00
Eugene Yurtsev
eb08b064bb
docs: Remove giscus comments (#31755)
Remove giscus comments from langchain
2025-06-27 09:56:55 -04:00
Mason Daugherty
e1aff00cc1
groq: support reasoning_effort, update docs for clarity (#31754)
- There was some ambiguous wording that has been updated to hopefully
clarify the functionality of `reasoning_format` in ChatGroq.
- Added support for `reasoning_effort`
- Added links to see models capable of `reasoning_format` and
`reasoning_effort`
- Other minor nits
2025-06-27 09:43:40 -04:00
ccurme
ea1345a58b
openai[patch]: update cassette (#31752)
Following changes in `openai==1.92`.
2025-06-26 14:52:12 -04:00
ccurme
066be383e3
openai[patch]: update test following release of openai 1.92 (#31751)
Added new required fields for `ResponseFunctionWebSearch`
2025-06-26 18:22:58 +00:00
ccurme
61feaa4656
openai: release 0.3.26 (#31749) 2025-06-26 13:51:51 -04:00
ccurme
88d5f3edcc
openai[patch]: allow specification of output format for Responses API (#31686) 2025-06-26 13:41:43 -04:00
Mason Daugherty
59c2b81627
docs: fix some inline links (#31748) 2025-06-26 13:35:14 -04:00
Lauren Hirata Singh
83774902e7
docs: Academy banner (#31745)
Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
  - Example: "core: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
2025-06-26 10:03:06 -04:00
ccurme
0ae434be21
anthropic: release 0.3.16 (#31744) 2025-06-26 09:09:29 -04:00
Mason Daugherty
a08e73f07e
docs: remove trailing backticks (#31740) 2025-06-26 01:23:39 -04:00
Martin Schaer
421554007f
docs: Update surrealdb vectorestore documentation (#31199) 2025-06-25 20:16:43 +00:00
Mason Daugherty
2fb27b63f5
ollama: update tests, docs (#31736)
- docs: for the Ollama notebooks, improve the specificity of some links,
add `homebrew` install info, update some wording
- tests: reduce number of local models needed to run in half from 4 → 2
(shedding 8gb of required installs)
- bump deps (non-breaking) in anticipation of upcoming "thinking" PR
2025-06-25 20:13:20 +00:00
ccurme
a1f3147989
docs: update sort order for integrations table (#31737)
Pull latest download statistics.
2025-06-25 16:03:36 -04:00
ccurme
84500704ab
openai[patch]: fix bug where function call IDs were not populated (#31735)
(optional) IDs were getting dropped in some cases.
2025-06-25 19:08:27 +00:00
ccurme
0bf223d6cf
openai[patch]: add attribute to always use previous_response_id (#31734) 2025-06-25 19:01:43 +00:00
ccurme
b02bd67788
anthropic[patch]: cache clients (#31659) 2025-06-25 14:49:02 -04:00
Michael Li
e3f1ce0ac5
docs: fix retriever typos (#31733) 2025-06-25 16:06:21 +00:00
Michael Li
5d734ac8a8
docs: fix typo in clarifai.ipynb (#31732) 2025-06-25 15:59:29 +00:00
Michael Li
a09583204c
docs: fix typos in tool_feat_table.py (#31731) 2025-06-25 15:56:59 +00:00