langchain

mirror of https://github.com/hwchase17/langchain.git synced 2026-02-21 14:43:07 +00:00

Author	SHA1	Message	Date
John Kennedy	5b68956a0c	feat(middleware): add Tool Firewall defense stack for prompt injection Implements the complete defense stack from arXiv:2510.05244 and arXiv:2412.16682: 1. ToolInputMinimizerMiddleware (INPUT PROTECTION) - Filters tool arguments before execution - Prevents data exfiltration attacks - Based on Tool-Input Firewall from arXiv:2510.05244 2. TaskShieldMiddleware (TOOL USE PROTECTION) - Verifies actions align with user's goal - Blocks goal hijacking attacks - Based on Task Shield from arXiv:2412.16682 3. PromptInjectionDefenseMiddleware (OUTPUT PROTECTION) - Already existed, updated docstrings for clarity - Sanitizes tool outputs before agent processes them Defense stack achieves 0% ASR on AgentDojo, InjecAgent, ASB, tau-Bench benchmarks when used together. Usage: middleware=[ ToolInputMinimizerMiddleware(model), TaskShieldMiddleware(model), PromptInjectionDefenseMiddleware.check_then_parse(model), ]	2026-02-03 22:57:09 -08:00
John Kennedy	c2e64d0f43	adding google dps	2026-01-31 18:12:04 -08:00
John Kennedy	aa248def4a	fix: resolve mypy type errors in prompt injection defense - Fix ToolCallRequest import path (langgraph.prebuilt.tool_node) - Use Runnable type for model with tools bound (bind_tools returns Runnable) - Handle args_schema that may be dict or BaseModel	2026-01-31 18:10:18 -08:00
John Kennedy	608bc115b9	fix: skip baseline vulnerability tests by default in CI Add pytestmark to skip unless RUN_BENCHMARK_TESTS=1 is set, matching the other LLM-dependent test files.	2026-01-31 18:05:42 -08:00
John Kennedy	a35f869eb9	chore: add langchain-google-genai to test dependencies	2026-01-31 18:03:49 -08:00
John Kennedy	f761769de4	fix: resolve ruff linting errors and test parameter mismatch - Sort imports and __all__ in __init__.py - Add noqa comments for intentional fullwidth Unicode in DeepSeek markers - Add return type annotations to __init__ methods - Fix line length issues in prompt strings - Remove unnecessary else after return - Add noqa for lazy imports (intentional for avoiding circular imports) - Fix test_baseline_vulnerability.py parameter count (4 not 5)	2026-01-31 17:59:41 -08:00
John Kennedy	b2216bc600	feat(tests): compare all defense strategies in injection tests Update arg hijacking and add strategy comparison tests to evaluate: 1. CombinedStrategy (CheckTool + ParseData) 2. IntentVerificationStrategy alone 3. All three strategies combined Tests pass if any strategy successfully defends against the attack, helping identify which strategies work best for different attack types.	2026-01-31 17:39:46 -08:00
John Kennedy	0dd205f223	feat: add IntentVerificationStrategy for argument hijacking defense Adds a new defense strategy that detects when tool results attempt to override user-specified values (e.g., changing email recipients, modifying subjects). This complements CheckToolStrategy which detects unauthorized tool calls. - IntentVerificationStrategy compares tool results against user's original intent from conversation history - Uses same marker-based parsing as other strategies for injection safety - Add create_tool_request_with_user_message() helper for tests - Update arg hijacking tests to use IntentVerificationStrategy	2026-01-31 15:59:53 -08:00
John Kennedy	51a4e7d27a	feat(tests): add argument hijacking tests and Google Gemini support - Add argument hijacking test cases (BCC injection, subject manipulation, body append, recipient swap) to test subtle attacks where tool calls are expected but arguments are manipulated - Add Google Gemini (gemini-3-flash-preview) to all benchmark tests - Use granite4:small-h instead of tiny-h for more reliable Ollama tests - DRY up Ollama config by using constants from conftest	2026-01-31 15:37:57 -08:00
John Kennedy	76468eb28e	fix(tests): check tool triggering instead of string presence in injection tests The security property we care about is whether malicious tools are triggered, not whether malicious-looking strings appear in output. Data may legitimately contain URLs/emails that look suspicious but aren't actionable injections. - Replace string-based assertions with check_triggers_tools() that verifies the sanitized output doesn't trigger target tools when fed back to model - Remove assert_*_blocked functions that checked for domain strings - Simplify INJECTION_TEST_CASES to (payload, tools, tool_name, target_tools)	2026-01-31 15:13:42 -08:00
John Kennedy	345ab3870b	fixup! refactor: DRY up extended tests, focus on prompt injection only	2026-01-31 14:57:48 -08:00
John Kennedy	85360afd14	feat: add marker sanitization and filter mode for prompt injection defense - Add DEFAULT_INJECTION_MARKERS list covering major LLM providers: - Defense prompt delimiters - Llama/Mistral: [INST], <<SYS>> - OpenAI/Qwen (ChatML): <\|im_start\|>, <\|im_end\|> - Anthropic Claude: Human:/Assistant: (with newline prefix) - DeepSeek: fullwidth Unicode markers - Google Gemma: <start_of_turn>, <end_of_turn> - Vicuna: USER:/ASSISTANT: - Generic XML role markers - Add sanitize_markers() function to strip injection markers from content - Add configurable sanitize_markers param to CheckToolStrategy and ParseDataStrategy - Add filter mode (on_injection='filter') that uses model's text response when injection detected (no extra LLM call needed) - Add _get_tool_schema() for tool return type descriptions in ParseDataStrategy - Export DEFAULT_INJECTION_MARKERS and sanitize_markers from __init__.py	2026-01-31 14:55:23 -08:00
John Kennedy	62349bd6fd	test: reorganize prompt injection tests and skip E2E by default - Rename test_prompt_injection_combined.py -> test_prompt_injection_baseline_vs_protected.py - Delete redundant test_prompt_injection_defense_extended.py - Skip E2E tests by default (require RUN_BENCHMARK_TESTS=1) - Reduce Ollama models to frontier only (granite4:tiny-h) - Refactor to reduce code duplication in test files - Update docstrings with cross-references Test organization: - test_prompt_injection_defense.py: Unit tests with mocks (CI, fast) - test_prompt_injection_baseline_vs_protected.py: E2E baseline vs protected - test_prompt_injection_token_benchmark.py: Token usage benchmarks To run E2E tests with real models: RUN_BENCHMARK_TESTS=1 uv run pytest tests/unit_tests/agents/middleware/implementations/test_prompt_injection_* -svv	2026-01-31 14:55:23 -08:00
John Kennedy	46cdd2245b	test: refactor prompt injection tests to reduce duplication - Add helper functions: make_tool_message, make_tool_request, make_triggered_response, setup_model_with_response - Parameterize tests for providers, factory methods, strategies - Add fixtures for common mock setups - Consolidate similar test cases while maintaining coverage	2026-01-31 14:55:23 -08:00
John Kennedy	f03127e7fd	test: add combined baseline/protected test and token benchmark - test_prompt_injection_combined.py: single test shows both baseline vulnerability and protected status for each model/payload - test_prompt_injection_token_benchmark.py: measures token usage across no_defense, check_only, parse_only, and combined strategies	2026-01-31 14:55:23 -08:00
John Kennedy	e088029a63	test: expand injection payloads to 20 variants across 5 attack categories - 4 variants per attack category: - Goal hijacking: system update, priority override, casual, xml tags - Privilege escalation: html comment, js comment, maintenance, sql - Indirect trigger: sysadmin note, friendly, auto cleanup, doc metadata - Multi-language: spanish, german, chinese, portuguese - JSON/XML injection: notes field, internal directive, nested meta, xml - Use realistic test domains (test-sink.net, etc.) instead of obvious names - 260 tests total: 20 attacks × 13 models	2026-01-31 14:55:23 -08:00
John Kennedy	1fbf7cf910	refactor: simplify prompt injection tests, add shared conftest - Create conftest.py with shared tools, payloads, fixtures, and helpers - Consolidate extended tests into simple parametrized test classes - Add multi_language and json_injection to test cases (5 total attacks) - Baseline and protected tests now use same test cases for comparison - 65 tests each: 5 attacks × 13 models (3 OpenAI, 3 Anthropic, 7 Ollama)	2026-01-31 14:55:23 -08:00
John Kennedy	b7dac2c90b	fix: cleanup unused imports, add Anthropic to extended tests - Remove unused SystemMessage import - Fix reference to non-existent e2e test file - Add anthropic_model to all parameterized security tests - Now tests both OpenAI (gpt-5.2) and Anthropic (claude-sonnet-4-5)	2026-01-31 14:55:22 -08:00
John Kennedy	97b933ae1f	refactor: DRY up extended tests, focus on prompt injection only - Extract shared attack payloads as constants - Add helper functions for strategy creation and assertions - Parameterize Ollama tests to reduce duplication - Remove non-security tests (caching, false positives, safe content) - Update models to gpt-5.2 and claude-sonnet-4-5 - 11 tests total (7 OpenAI, 4 Ollama skipped)	2026-01-31 14:55:22 -08:00
John Kennedy	7b695f047a	Add PromptInjectionDefenseMiddleware with pluggable strategy pattern Implements defense against indirect prompt injection attacks from external/untrusted data sources (tool results, web content, etc.) based on the paper: 'Defense Against Indirect Prompt Injection via Tool Result Parsing' https://arxiv.org/html/2601.04795v1 Features: - Pluggable DefenseStrategy protocol for extensibility - Built-in strategies: CheckToolStrategy, ParseDataStrategy, CombinedStrategy - Factory methods for recommended configurations from paper - Focuses on tool results as primary attack vector - Extensible to other external data sources in the future Key components: - PromptInjectionDefenseMiddleware: Main middleware class - DefenseStrategy: Protocol for custom defense implementations - CheckToolStrategy: Detects and removes tool-triggering content - ParseDataStrategy: Extracts only required data with format constraints - CombinedStrategy: Chains multiple strategies together Usage: agent = create_agent( 'openai:gpt-4o', middleware=[ PromptInjectionDefenseMiddleware.check_then_parse('openai:gpt-4o'), ], )	2026-01-31 14:55:22 -08:00
ccurme	b50ecd49eb	release(standard-tests): 1.1.3 (#34949 ) langchain-standard-tests==1.1.3 langchain-tests==1.1.3	2026-01-31 16:39:23 -05:00
John Kennedy	c5834cc028	chore: upgrade urllib3 to 2.6.3 (#34940 )	2026-01-31 16:30:17 -05:00
Jackjin	488db577e2	fix(core): prevent crash in ParrotFakeChatModel when messages list is empty (#34943 )	2026-01-31 16:17:39 -05:00
Abhishek Laddha	ccd4032789	docs(docs): add uv sync step to local setup instructions (#34944 )	2026-01-31 16:16:43 -05:00
Louis Auneau	f5252b438e	fix(core): google docstring parsing with no arguments/reserved arguments (#34861 )	2026-01-30 22:48:58 -05:00
Lewis Whitehill	0c9d392d41	test(core): add tests for approximate token counting with multimodal messages (#34898 )	2026-01-30 12:35:16 -08:00
Mason Daugherty	638c33f65d	fix(core): replace `Iterable` with `Iterator` for block iteration (#34934 )	2026-01-30 12:08:22 -08:00
Mason Daugherty	017c8e05ec	fix(core): `yield_blobs` returns `Iterator` (#34935 ) Implementations using yield return generators, which are of type `Iterator`. This is technically a breaking change for implementers, however, known existing implementations (in `langchain-community`) use `yield`, so they already return `Iterator`s. For callers, it is not breaking. Closes #25718	2026-01-30 12:08:13 -08:00
wixarv	c09cba2f87	docs: add CONTRIBUTING.md pointing to online guide (#34901 )	2026-01-30 11:47:11 -08:00
wixarv	8a81852a83	refactor: replace print with logger.info in llm_summarization_checker (#34903 )	2026-01-30 11:12:54 -08:00
zvibo	13e6327d3f	docs: Fix typo in Runnable description of async variants (#34905 )	2026-01-30 10:52:38 -08:00
Rohan Disa	4004883806	fix(xai): Live search deprecation (#34919 )	2026-01-30 10:01:20 -08:00
zer0	6ff8436fb0	fix(core): raise outputparserexception for unknown tools (#34923 )	2026-01-30 09:35:31 -08:00
Mason Daugherty	72571185a8	docs(core): nit (#34914 )	2026-01-28 10:54:53 -08:00
Mason Daugherty	7e9c53ff8d	feat(infra): related issues for bug report template (#34913 )	2026-01-28 10:54:14 -08:00
Mason Daugherty	f8d5a5069f	chore(core): nits (#34897 )	2026-01-26 18:05:37 -08:00
cc	585b691c1d	feat(core): add multimodal support to count_tokens_approximately (#34883 )	2026-01-26 15:04:25 -08:00
dependabot[bot]	c4e645cf13	chore(deps): bump actions/create-github-app-token from 1 to 2 (#34886 )	2026-01-26 14:38:21 -08:00
Mason Daugherty	3da89bd380	feat(langchain): add `ToolCallRequest` to middleware exports (#34894 ) https://github.com/langchain-ai/docs/pull/2358	2026-01-26 11:54:14 -08:00
ccurme	c930062f69	chore(infra): re-enable tests on prior published packages on core release (#34881 )	2026-01-25 20:36:40 -08:00
ccurme	aaba1b0bcb	release(xai): 1.2.2 (#34880 ) langchain-xai==1.2.2	2026-01-25 20:20:44 -08:00
Sholto Armstrong	666bb6fe53	fix(xai): fix routing of chat completions vs. responses apis during streaming (#34868 )	2026-01-25 19:58:11 -08:00
Bodhi Russell Silberling	f0ca2c4675	fix(core): fix typo 'use a a' -> 'use as a' in check_version.py (#34878 )	2026-01-25 19:26:22 -08:00
Mason Daugherty	11df1bedc3	style(core): lint (#34862 ) it looks scary but i promise it is not improving documentation consistency across core. primarily update docstrings and comments for better formatting, readability, and accuracy, as well as add minor clarifications and formatting improvements to user-facing documentation.	2026-01-23 23:07:48 -05:00
Mason Daugherty	51f13f7bff	style(text-splitters): lint (#34865 )	2026-01-23 23:07:36 -05:00
Mason Daugherty	17de2a3685	style(langchain): lint (#34863 ) it looks scary but i promise it is not improving documentation consistency across langchain. primarily update docstrings and comments for better formatting, readability, and accuracy, as well as add minor clarifications and formatting improvements to user-facing documentation.	2026-01-23 23:00:44 -05:00
Mason Daugherty	72333ad644	fix(langchain): blocking unit test (#34866 ) =	2026-01-23 22:58:03 -05:00
Mason Daugherty	703d170a4a	style(model-profiles): lint (#34864 )	2026-01-23 22:40:59 -05:00
Mason Daugherty	80e09feec1	docs: add Chat LangChain link and highlight Deep Agents (#34858 )	2026-01-23 15:20:26 -05:00
Christophe Bornet	ca9d2c0bdd	test(langchain): use blockbuster to detect blocking calls in the async event loop (#34777 )	2026-01-23 14:52:56 -05:00

1 2 3 4 5 ...

15216 Commits