langchain

mirror of https://github.com/hwchase17/langchain.git synced 2025-07-13 00:16:01 +00:00

Author	SHA1	Message	Date
ccurme	b78ae7817e	openai[patch]: trace strict in structured_output_kwargs (#30425 )	2025-03-21 14:37:28 -04:00
ccurme	1de7fa8f3a	Revert "deepseek: temporarily bypass tests" (#30424 ) Reverts langchain-ai/langchain#30423	2025-03-21 17:14:31 +00:00
ccurme	c74dfff836	deepseek: temporarily bypass tests (#30423 ) Deepseek infra is not stable enough to get through integration tests. Previous two attempts had two tests time out, they both pass locally.	2025-03-21 17:08:35 +00:00
ccurme	7147903724	deepseek: release 0.1.3 (#30422 )	2025-03-21 16:39:50 +00:00
Andras L Ferenczi	b5f49df86a	partner: ChatDeepSeek on openrouter not returning reasoning (#30240 ) Deepseek model does not return reasoning when hosted on openrouter (Issue [30067](https://github.com/langchain-ai/langchain/issues/30067)) the following code did not return reasoning: ```python llm = ChatDeepSeek( model = 'deepseek/deepseek-r1:nitro', api_base="https://openrouter.ai/api/v1", api_key=os.getenv("OPENROUTER_API_KEY")) messages = [ {"role": "system", "content": "You are an assistant."}, {"role": "user", "content": "9.11 and 9.8, which is greater? Explain the reasoning behind this decision."} ] response = llm.invoke(messages, extra_body={"include_reasoning": True}) print(response.content) print(f"REASONING: {response.additional_kwargs.get('reasoning_content', '')}") print(response) ``` The fix is to extract reasoning from response.choices[0].message["model_extra"] and from choices[0].delta["reasoning"]. and place in response additional_kwargs. Change is really just the addition of a couple one-sentence if statements. --------- Co-authored-by: andrasfe <andrasf94@gmail.com> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-03-21 16:35:37 +00:00
Vadym Barda	4852ab8d0a	core[patch]: more tests for trim_messages (#30421 )	2025-03-21 16:19:52 +00:00
ccurme	e8e3b2bfae	ollama: release 0.3.0 (#30420 )	2025-03-21 15:50:08 +00:00
Bob Merkus	5700646cc5	ollama: add reasoning model support (e.g. deepseek) (#29689 ) # Description This PR adds reasoning model support for `langchain-ollama` by extracting reasoning token blocks, like those used in deepseek. It was inspired by [ollama-deep-researcher](https://github.com/langchain-ai/ollama-deep-researcher), specifically the parsing of [thinking blocks](`6d1aaf2139/src/assistant/graph.py (L91)`): ```python # TODO: This is a hack to remove the <think> tags w/ Deepseek models # It appears very challenging to prompt them out of the responses while "<think>" in running_summary and "</think>" in running_summary: start = running_summary.find("<think>") end = running_summary.find("</think>") + len("</think>") running_summary = running_summary[:start] + running_summary[end:] ``` This notes that it is very hard to remove the reasoning block from prompting, but we actually want the model to reason in order to increase model performance. This implementation extracts the thinking block, so the client can still expect a proper message to be returned by `ChatOllama` (and use the reasoning content separately when desired). This implementation takes the same approach as [ChatDeepseek](`5d581ba22c/libs/partners/deepseek/langchain_deepseek/chat_models.py (L215)`), which adds the reasoning content to chunk.additional_kwargs.reasoning_content; ```python if hasattr(response.choices[0].message, "reasoning_content"): # type: ignore rtn.generations[0].message.additional_kwargs["reasoning_content"] = ( response.choices[0].message.reasoning_content # type: ignore ) ``` This should probably be handled upstream in ollama + ollama-python, but this seems like a reasonably effective solution. This is a standalone example of what is happening; ```python async def deepseek_message_astream( llm: BaseChatModel, messages: list[BaseMessage], config: RunnableConfig \| None = None, , model_target: str = "deepseek-r1", kwargs: Any, ) -> AsyncIterator[BaseMessageChunk]: """Stream responses from Deepseek models, filtering out <think> tags. Args: llm: The language model to stream from messages: The messages to send to the model Yields: Filtered chunks from the model response """ # check if the model is deepseek based if (llm.name and model_target not in llm.name) or (hasattr(llm, "model") and model_target not in llm.model): async for chunk in llm.astream(messages, config=config, kwargs): yield chunk return # Yield with a buffer, upon completing the <think></think> tags, move them to the reasoning content and start over buffer = "" async for chunk in llm.astream(messages, config=config, *kwargs): # start or append if not buffer: buffer = chunk.content else: buffer += chunk.content if hasattr(chunk, "content") else chunk # Process buffer to remove <think> tags if "<think>" in buffer or "</think>" in buffer: if hasattr(chunk, "tool_calls") and chunk.tool_calls: raise NotImplementedError("tool calls during reasoning should be removed?") if "<think>" in chunk.content or "</think>" in chunk.content: continue chunk.additional_kwargs["reasoning_content"] = chunk.content chunk.content = "" # upon block completion, reset the buffer if "<think>" in buffer and "</think>" in buffer: buffer = "" yield chunk ``` # Issue Integrating reasoning models (e.g. deepseek-r1) into existing LangChain based workflows is hard due to the thinking blocks that are included in the message contents. To avoid this, we could match the `ChatOllama` integration with `ChatDeepseek` to return the reasoning content inside `message.additional_arguments.reasoning_content` instead. # Dependenices None --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-03-21 15:44:54 +00:00
ccurme	d8145dda95	xai: release 0.2.2 (#30403 )	2025-03-20 20:25:16 +00:00
ccurme	e194902994	mistral: release 0.2.9 (#30402 )	2025-03-20 20:22:24 +00:00
ccurme	49466ec9ca	groq: release 0.3.1 (#30401 )	2025-03-20 20:19:49 +00:00
ccurme	db1e340387	fireworks: release 0.2.8 (#30400 )	2025-03-20 16:15:51 -04:00
ccurme	785a8e7d45	tests: release 0.3.15 (#30397 )	2025-03-20 15:38:40 -04:00
ccurme	5588ca4cfb	core: release 0.3.47 (#30396 )	2025-03-20 18:52:53 +00:00
ccurme	de3960d285	multiple: enforce standards on tool_choice (#30372 ) - Test if models support forcing tool calls via `tool_choice`. If they do, they should support - `"any"` to specify any tool - the tool name as a string to force calling a particular tool - Add `tool_choice` to signature of `BaseChatModel.bind_tools` in core - Deprecate `tool_choice_value` in standard tests in favor of a boolean `has_tool_choice` Will follow up with PRs in external repos (tested in AWS and Google already).	2025-03-20 17:48:59 +00:00
ccurme	b86cd8270c	multiple: support `strict` and `method` in with_structured_output (#30385 )	2025-03-20 13:17:07 -04:00
Mohammad Mohtashim	1103bdfaf1	(Ollama) Fix String Value parsing in _parse_arguments_from_tool_call (#30154 ) - Description: Fix String Value parsing in _parse_arguments_from_tool_call - Issue: #30145 --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-03-19 21:47:18 -04:00
Tim König	b5992695ae	community: add ZoteroRetriever (#30270 ) Description This contribution adds a retriever for the Zotero API. [Zotero](https://www.zotero.org/) is an open source reference management for bibliographic data and related research materials. A retriever will allow langchain applications to retrieve relevant documents from personal or shared group libraries, which I believe will be helpful for numerous applications, such as RAG systems, personal research assistants, etc. Tests and docs were added. The documentation provided assumes the retriever will be part of the langchain-community package, as this seemed customary. Please let me know if this is not the preferred way to do it. I also uploaded the implementation to PyPI. Dependencies The retriever requires the `pyzotero` package for API access. This dependency is stated in the docs, and the retriever will return an error if the package is not found. However, this dependency is not added to the langchain package itself. Twitter handle I'm no longer using Twitter, but I'd appreciate a shoutout on [Bluesky](https://bsky.app/profile/koenigt.bsky.social) or [LinkedIn](https://www.linkedin.com/in/dr-tim-k%C3%B6nig-534aa2324/)! Let me know if there are any issues, I'll gladly try and sort them out! --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-03-19 20:19:32 -04:00
pulvedu	4346aca5cf	Integration update (#30381 ) This pull request includes a change to the following - docs/docs/integrations/tools/tavily_search.ipynb - docs/docs/integrations/tools/tavily_extract.ipynb - added docs/docs/integrations/providers/tavily.mdx --------- Co-authored-by: pulvedu <dustin@tavily.com>	2025-03-19 17:58:25 -04:00
Daniel Rauber	9b687d7fbd	community[minor]: PlaywrightURLLoader can take stored session file (#30152 ) Description: Implements an additional `browser_session` parameter on PlaywrightURLLoader which can be used to initialize the browser context by providing a stored playwright context.	2025-03-19 16:29:07 -04:00
Vadym Barda	73c04f4707	core[patch]: release 0.3.46 (#30383 )	2025-03-19 15:09:08 -04:00
William FH	ce84f8ba7e	Dereference run tree (#30377 )	2025-03-19 19:05:06 +00:00
William FH	8265be4d3e	Unset context to None in var (#30380 )	2025-03-19 18:53:17 +00:00
William FH	4130e6476b	Unset context after step (#30378 ) While we are already careful to copy before setting the config, if other objects hold a reference to the config or context, it wouldn't be cleared.	2025-03-19 11:46:23 -07:00
Vadym Barda	37190881d3	core[patch]: add util for approximate token counting (#30373 )	2025-03-19 17:48:38 +00:00
Matthew Farrellee	5f812f5968	langchain-tests: skip instead of passing image message tests (#30375 ) Description: use skip for image message tests	2025-03-19 15:35:32 +00:00
ccurme	aae8306d6c	groq: release 0.3.0 (#30374 )	2025-03-19 15:23:30 +00:00
Ashwin	83cfb9691f	Fix typo: change 'ben' to 'be' in comment (#30358 ) Description: This PR fixes a minor typo in the comments within `libs/partners/openai/langchain_openai/chat_models/base.py`. The word "ben" has been corrected to "be" for clarity and professionalism. Issue: N/A Dependencies: None	2025-03-19 10:35:35 -04:00
Florian Chappaz	07cb41ea9e	community: aligning ChatLiteLLM default parameters with litellm (#30360 ) Description: Since `ChatLiteLLM` is forwarding most parameters to `litellm.completion(...)`, there is no reason to set other default values than the ones defined by `litellm`. In the case of parameter 'n', it also provokes an issue when trying to call a serverless endpoint on Azure, as it is considered an extra parameter. So we need to keep it optional. We can debate about backward compatibility of this change: in my opinion, there should not be big issues since from my experience, calling `litellm.completion()` without these parameters works fine. Issue: - #29679 Dependencies: None	2025-03-19 09:07:28 -04:00
Hodory	57ffacadd0	community: add keep_newlines parameter to process_pages method (#30365 ) - Description: Adding keep_newlines parameter to process_pages method with page_ids on Confluence document loader - Issue: N/A (This is an enhancement rather than a bug fix) - Dependencies: N/A - Twitter handle: N/A	2025-03-19 08:57:59 -04:00
William FH	f5a0092551	Rm test for parent_run presence (#30356 )	2025-03-18 19:44:19 -07:00
Adam Brenner	f949d9a3d3	docs: Add Dell PowerScale Document Loader (#30209 ) # Description Adds documentation on LangChain website for a Dell specific document loader for on-prem storage devices. Additional details on what the document loader is described in the PR as well as on our github repo: [https://github.com/dell/powerscale-rag-connector](https://github.com/dell/powerscale-rag-connector) This PR also creates a category on the document loader webpage as no existing category exists for on-prem. This follows the existing pattern already established as the website has a category for cloud providers. # Issue: New release, no issue. # Dependencies: None # Twitter handle: DellTech --------- Signed-off-by: Adam Brenner <adam@aeb.io> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-03-18 22:39:21 -04:00
ccurme	9fb0db6937	community: release 0.3.20 (#30354 )	2025-03-18 21:57:12 +00:00
ccurme	168f1dfd93	langchain[patch]: update text-splitters min bound (#30352 )	2025-03-18 20:53:43 +00:00
ccurme	f6cf2ce2ad	langchain[patch]: lock with latest text-splitters (#30350 )	2025-03-18 19:29:11 +00:00
ccurme	2909b49045	langchain: release 0.3.21 (#30348 )	2025-03-18 19:13:20 +00:00
ccurme	958f85d541	text-splitters: release 0.3.7 (#30347 )	2025-03-18 19:11:37 +00:00
Lance Martin	46d6bf0330	ollama[minor]: update default method for structured output (#30273 ) From function calling to Ollama's [dedicated structured output feature](https://ollama.com/blog/structured-outputs). --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-03-18 12:44:22 -04:00
Marlene	ff8ce60dcc	Core: Adding Azure AI to Supported Chat Models (#30342 ) - Description: I was testing out `init_chat` and saw that chat models can now be inferred. Azure OpenAI is currently only supported but we would like to add support for Azure AI which is a different package. This PR edits the `base.py` file to add the chat implementation. - I don't think this adds any additional dependencies - Will add a test and lint, but starting an initial draft PR. cc @santiagxf --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-03-18 11:53:20 -04:00
TheSongg	251551ccf1	doc: Implement langchain-xinference (#30296 ) - [ ] PR title: Implement langchain-xinference - [ ] PR message: Implement a standalone package for Xinference chat models and llm models. https://github.com/langchain-ai/langchain/issues/30045#issue-2887214214	2025-03-18 11:50:16 -04:00
wenmeng zhou	5a6e1254a7	support return reasoning content for models like qwq in dashscope (#30317 ) Thank you for contributing to LangChain! - [x] PR title: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" here is an example ```python from langchain_community.chat_models.tongyi import ChatTongyi from langchain_core.messages import HumanMessage chatLLM = ChatTongyi( model="qwq-32b", # refer to https://help.aliyun.com/zh/model-studio/getting-started/models for more models ) res = chatLLM.stream([HumanMessage(content="how much is 1 plus 1")]) for r in res: print(r) ``` ```shell content='' additional_kwargs={'reasoning_content': 'Okay, so the'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' user is asking "'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': 'how much is 1 plus'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' 1." Let me think'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' about this. Hmm'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ', 1 plus'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': " 1... That's a pretty"} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' basic math question. I'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' remember from arithmetic that when'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' you add 1 and'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' 1 together, the'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' result is 2.'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' But wait, maybe'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' I should double-check to be'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' sure. Let me visualize it'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': '. If I have one apple'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' and someone gives me another'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' apple, I have'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' two apples total. Yeah,'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' that makes sense. Or'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' on a number line'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ', starting at 1 and'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' moving 1 step forward lands'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' you at 2'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': '. \n\nIs there any'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' context where 1 +'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' 1 might not equal'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' 2? Like in different'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' number bases? Let'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': "'s see. In base"} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' 10, which'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' is standard,'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' 1+1 is'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' 2. But if'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' we were in binary'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' (base 2'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': '), 1 +'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' 1 would be 1'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': '0. But the question'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': " doesn't specify a base,"} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' so I think the'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' default is base 10'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': '. \n\nAlternatively, could'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' this be a trick'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' question? Maybe they'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': "'re referring to something else"} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ', like in Boolean'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' algebra where 1 +'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' 1 might still'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' be 1 in'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' some contexts? Wait'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ', no, in Boolean'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' addition, 1'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' + 1 is typically'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': " 1 because it's logical"} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' OR. But the'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' question just says "1'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' plus 1," which is'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' more arithmetic than Boolean.'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' \n\nOr maybe in some other'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' mathematical structure like modular arithmetic?'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' For example, modulo'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' 2,'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' 1 + 1 is'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' 0. But again'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ', unless specified, it'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': "'s probably standard addition"} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': '. \n\nThe user might be'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' testing if I know basic'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' math, or maybe'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': " they're a student just"} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' starting out. Either way,'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' the straightforward answer is'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' 2. I should also'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': " consider if there's any cultural"} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' references or jokes where'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' 1 + 1 equals'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' something else, but I can'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': "'t think of any common"} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' ones. \n\nAlternatively'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ', in some contexts like'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' in chemistry,'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' 1 + 1 could refer'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' to mixing solutions, but that'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': "'s not standard. The question"} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' is pretty simple,'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' so I think the answer'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' is 2. To'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' be thorough, maybe mention'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' that in standard arithmetic it'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': "'s 2, but if"} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': " there's a different"} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' context, the answer'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' might vary. But since'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' no context is given'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ', 2 is the safest'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ' answer.'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='The result' additional_kwargs={'reasoning_content': ''} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content=' of 1 plus' additional_kwargs={'reasoning_content': ''} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content=' 1 is 2.' additional_kwargs={'reasoning_content': ''} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content=' \n\nIn standard arithmetic (base' additional_kwargs={'reasoning_content': ''} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content=' 10), adding' additional_kwargs={'reasoning_content': ''} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content=' 1 and 1 together' additional_kwargs={'reasoning_content': ''} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content=' yields 2. This is' additional_kwargs={'reasoning_content': ''} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content=' a fundamental mathematical principle. If' additional_kwargs={'reasoning_content': ''} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content=' the question involves a different context' additional_kwargs={'reasoning_content': ''} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content=' (e.g., binary' additional_kwargs={'reasoning_content': ''} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content=', modular arithmetic, or a' additional_kwargs={'reasoning_content': ''} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content=' metaphorical meaning), it' additional_kwargs={'reasoning_content': ''} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content=' would need clarification,' additional_kwargs={'reasoning_content': ''} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content=' but under typical circumstances, the' additional_kwargs={'reasoning_content': ''} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content=' answer is 2.' additional_kwargs={'reasoning_content': ''} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' content='' additional_kwargs={'reasoning_content': ''} response_metadata={'finish_reason': 'stop', 'request_id': '4738c641-6bd8-9efc-a4fe-d929d4e62bef', 'token_usage': {'input_tokens': 16, 'output_tokens': 560, 'total_tokens': 576}} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d' ``` Co-authored-by: ccurme <chester.curme@gmail.com>	2025-03-18 11:43:10 -04:00
ccurme	b91daf06eb	groq[minor]: remove default model (#30341 ) The default model for `ChatGroq`, `"mixtral-8x7b-32768"`, is being retired on March 20, 2025. Here we remove the default, such that model names must be explicitly specified (being explicit is a good practice here, and avoids the need for breaking changes down the line). This change will be released in a minor version bump to 0.3. This follows https://github.com/langchain-ai/langchain/pull/30161 (released in version 0.2.5), where we began generating warnings to this effect. ![Screenshot 2025-03-18 at 10 33 27 AM](https://github.com/user-attachments/assets/f1e4b302-c62a-43b0-aa86-eaf9271e86cb)	2025-03-18 10:50:34 -04:00
amuwall	f6a17fbc56	community: fix import exception too constrictive (#30218 ) Fix this issue #30097	2025-03-17 22:09:02 -04:00
qonnop	036f00dc92	community: support in-memory data (Blob.from_data) in all audio parsers (#30262 ) OpenAIWhisperParser, OpenAIWhisperParserLocal, YandexSTTParser do not handle in-memory audio data (loaded via Blob.from_data) correctly. They require Blob.path to be set and AudioSegment is always read from the file system. In-memory data is handled correctly only for FasterWhisperParser so far. I changed OpenAIWhisperParser, OpenAIWhisperParserLocal, YandexSTTParser accordingly to match FasterWhisperParser. Thanks for reviewing the PR! Co-authored-by: qonnop <qonnop@users.noreply.github.com>	2025-03-17 19:52:33 -04:00
Matthew Farrellee	1985aaf095	langchain-tests: allow subclasses to add addition, non-standard tests (#30204 ) description: the ChatModel[Integration]Tests classes are powerful and helpful, this change allows sub-classes to add additional tests. for instance, ``` class TestChatMyServiceIntegration(ChatModelIntegrationTests): ... def test_myservice(self, model: BaseChatModel) -> None: ... ``` --------- Co-authored-by: ccurme <chester.curme@gmail.com>	2025-03-17 23:37:16 +00:00
Ben	789db7398b	text-splitters: Add JSFrameworkTextSplitter for Handling JavaScript Framework Code (#28972 ) ## Description This pull request introduces a new text splitter, `JSFrameworkTextSplitter`, to the Langchain library. The `JSFrameworkTextSplitter` extends the `RecursiveCharacterTextSplitter` to handle JavaScript framework code effectively, including React (JSX), Vue, and Svelte. It identifies and utilizes framework-specific component tags and syntax elements as splitting points, alongside standard JavaScript syntax. This ensures that code is divided at natural boundaries, enhancing the parsing and processing of JavaScript and framework-specific code. ### Key Features - Supports React (JSX), Vue, and Svelte frameworks. - Identifies and uses framework-specific tags and syntax elements as natural splitting points. - Extends the existing `RecursiveCharacterTextSplitter` for seamless integration. ## Issue No specific issue addressed. ## Dependencies No additional dependencies required. --------- Co-authored-by: ccurme <chester.curme@gmail.com>	2025-03-17 23:32:33 +00:00
ccurme	5684653775	openai[patch]: release 0.3.9 (#30325 )	2025-03-17 16:08:41 +00:00
ccurme	eb9b992aa6	openai[patch]: support additional Responses API features (#30322 ) - Include response headers - Max tokens - Reasoning effort - Fix bug with structured output / strict - Fix bug with simultaneous tool calling + structured output	2025-03-17 12:02:21 -04:00
Bae-ChangHyun	d8510270ee	community: add 'extract' mode to FireCrawlLoader for structured data extraction (#30242 ) Description: Added an 'extract' mode to FireCrawlLoader that enables structured data extraction from web pages. This feature allows users to Extract structured data from a single URLs, or entire websites using Large Language Models (LLMs). You can show more params and usage on [firecrawl docs](https://docs.firecrawl.dev/features/extract-beta). You can extract from only one url now.(it depends on firecrawl's extract method) Dependencies: No new dependencies required. Uses existing FireCrawl API capabilities. --------- Co-authored-by: chbae <chbae@gcsc.co.kr> Co-authored-by: ccurme <chester.curme@gmail.com>	2025-03-17 15:15:57 +00:00
qonnop	747efa16ec	community: fix CPU support for FasterWhisperParser (implicit compute type for WhisperModel) (#30263 ) FasterWhisperParser fails on a machine without an NVIDIA GPU: "Requested float16 compute type, but the target device or backend do not support efficient float16 computation." This problem arises because the WhisperModel is called with compute_type="float16", which works only for NVIDIA GPU. According to the [CTranslate2 docs](https://opennmt.net/CTranslate2/quantization.html#bit-floating-points-float16) float16 is supported only on NVIDIA GPUs. Removing the compute_type parameter solves the problem for CPUs. According to the [CTranslate2 docs](https://opennmt.net/CTranslate2/quantization.html#quantize-on-model-loading) setting compute_type to "default" (standard when omitting the parameter) uses the original compute type of the model or performs implicit conversion for the specific computation device (GPU or CPU). I suggest to remove compute_type="float16". @hulitaitai you are the original author of the FasterWhisperParser - is there a reason for setting the parameter to float16? Thanks for reviewing the PR! Co-authored-by: qonnop <qonnop@users.noreply.github.com>	2025-03-14 22:22:29 -04:00
ccurme	c74e7b997d	openai[patch]: support structured output via Responses API (#30265 ) Also runs all standard tests using Responses API.	2025-03-14 15:14:23 -04:00
Priyansh Agrawal	f54f14b747	community: cube document loader - do not load non-public dimensions and measures (#30286 ) Thank you for contributing to LangChain! - [x] PR title: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - Description: Do not load non-public dimensions and measures (public: false) with Cube semantic loader - Issue: Currently, non-public dimensions and measures are loaded by the Cube document loader which leads to downstream applications using these which is not allowed by Cube. - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, eyurtsev, ccurme, vbarda, hwchase17.	2025-03-14 15:07:56 -04:00
Stavros Kontopoulos	ac22cde130	langchain_ollama: Support keep_alive in embeddings (#30251 ) - Description: Adds support for keep_alive in Ollama Embeddings see https://github.com/ollama/ollama/issues/6401. Builds on top of of https://github.com/langchain-ai/langchain/pull/29296. I have this use case where I want to keep the embeddings model in cpu forever. - Dependencies: no deps are being introduced. - Issue: haven't created an issue yet.	2025-03-14 14:56:50 -04:00
homeffjy	2c99f12062	community[patch]: fix bilibili loader handling of multi-page content (#30283 ) Previously the loader would only extract subtitles from the first page of multi-page videos.	2025-03-14 14:53:03 -04:00
ccurme	d5d0134e7b	anthropic: release 0.3.10 (#30287 )	2025-03-14 16:23:21 +00:00
ccurme	226f29bc96	anthropic: support built-in tools, improve docs (#30274 ) - Support features from recent update: https://www.anthropic.com/news/token-saving-updates (mostly adding support for built-in tools in `bind_tools` - Add documentation around prompt caching, token-efficient tool use, and built-in tools.	2025-03-14 16:18:50 +00:00
Priyansh Agrawal	f27e2d7ce7	community: cube document loader - fix logging (#30285 ) Thank you for contributing to LangChain! - [x] PR title: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - Description: Fix bad log message on line#56 and replace f-string logs with format specifiers - Issue: Log messages such as this one `INFO:langchain_community.document_loaders.cube_semantic:Loading dimension values for: {dimension_name}...` - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, eyurtsev, ccurme, vbarda, hwchase17.	2025-03-14 11:36:18 -04:00
ccurme	bbd4b36d76	mistralai[patch]: bump core (#30278 )	2025-03-13 23:04:36 +00:00
ccurme	315bb17ef5	core: release 0.3.45 (#30277 )	2025-03-13 22:44:23 +00:00
pulvedu	d0bfc7f820	community[fix] : Pass API_KEY as argument (#30272 ) PR Title: community: Fix Pass API_KEY as argument PR Message: Description: This PR fixes validation error "Value error, Did not find tavily_api_key, please add an environment variable `TAVILY_API_KEY` which contains it, or pass `tavily_api_key` as a named parameter." Dependencies: No new dependencies introduced. --------- Co-authored-by: pulvedu <dustin@tavily.com> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-03-13 22:19:31 +00:00
ccurme	733abcc884	mistral: release 0.2.8 (#30275 )	2025-03-13 21:54:34 +00:00
Jacob Lee	e9c1765967	fix(core): Ignore missing secrets on deserialization (#30252 )	2025-03-13 12:27:03 -07:00
ccurme	ebea5e014d	standard tests: test simple agent loop (#30268 )	2025-03-13 16:34:12 +00:00
ccurme	cd1ea8e94d	openai[patch]: support Responses API (#30231 ) Co-authored-by: Bagatur <baskaryan@gmail.com>	2025-03-12 12:25:46 -04:00
Jason Zhang	49bdd3b6fe	docs: Add AgentQL provider doc, tool/toolkit doc and documentloader doc (#30144 ) - Description: Added AgentQL docs for the provider page, tools page and documentloader page - Twitter handle: @AgentQL Repo: https://github.com/tinyfish-io/agentql-integrations/tree/main/langchain PyPI: https://pypi.org/project/langchain-agentql/ If no one reviews your PR within a few days, please @-mention one of baskaryan, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-03-11 21:57:40 -04:00
Vadym Barda	23fa70f328	core[patch]: release 0.3.44 (#30236 )	2025-03-11 18:59:02 -04:00
Vadym Barda	c7842730ef	core[patch]: support single-node subgraphs and put subgraph nodes under the respective subgraphs (#30234 )	2025-03-11 18:55:45 -04:00
ccurme	62c570dd77	standard-tests, openai: bump core (#30202 )	2025-03-10 19:22:24 +00:00
ccurme	f896e701eb	deepseek: install local langchain-tests in test deps (#30198 )	2025-03-10 16:58:17 +00:00
Hugh Gao	aa6dae4a5b	community: Remove the system message count limit for ChatTongyi. (#30192 ) ## Description The models in DashScope support multiple SystemMessage. Here is the [Doc](https://bailian.console.aliyun.com/model_experience_center/text#/model-market/detail/qwen-long?tabKey=sdk), and the example code on the document page: ```python import os from openai import OpenAI client = OpenAI( api_key=os.getenv("DASHSCOPE_API_KEY"), # 如果您没有配置环境变量，请在此处替换您的API-KEY base_url="https://dashscope.aliyuncs.com/compatible-mode/v1", # 填写DashScope服务base_url ) # 初始化messages列表 completion = client.chat.completions.create( model="qwen-long", messages=[ {'role': 'system', 'content': 'You are a helpful assistant.'}, # 请将 'file-fe-xxx'替换为您实际对话场景所使用的 file-id。 {'role': 'system', 'content': 'fileid://file-fe-xxx'}, {'role': 'user', 'content': '这篇文章讲了什么？'} ], stream=True, stream_options={"include_usage": True} ) full_content = "" for chunk in completion: if chunk.choices and chunk.choices[0].delta.content: # 拼接输出内容 full_content += chunk.choices[0].delta.content print(chunk.model_dump()) print({full_content}) ``` Tip: The example code is for OpenAI, but the document said that it also supports the DataScope API, and I tested it, and it works. ``` Is the Dashscope SDK invocation method compatible? Yes, the Dashscope SDK remains compatible for model invocation. However, file uploads and file-ID retrieval are currently only supported via the OpenAI SDK. The file-ID obtained through this method is also compatible with Dashscope for model invocation. ```	2025-03-10 08:58:40 -04:00
ccurme	67aff1648b	community: Add OpenGradient integration (Toolkit) (#30190 ) Commandeering https://github.com/langchain-ai/langchain/pull/30135 --------- Co-authored-by: kylexqian <kylexqian@gmail.com>	2025-03-09 18:08:07 -04:00
ccurme	b209d46eb3	mistral[patch]: set global ssl context (#30189 )	2025-03-09 21:27:41 +00:00
Vijay Selvaraj	df459d0d5e	community: add Valthera integration (#30105 ) ```markdown Description: This PR integrates Valthera into LangChain, introducing an framework designed to send highly personalized nudges by an LLM agent. This is modeled after Dr. BJ Fogg's Behavior Model. This integration includes: - Custom data connectors for HubSpot, PostHog, and Snowflake. - A unified data aggregator that consolidates user data. - Scoring configurations to compute motivation and ability scores. - A reasoning engine that determines the appropriate user action. - A trigger generator to create personalized messages for user engagement. Issue: N/A Dependencies: N/A Twitter handle: - `@vselvarajijay` Tests and Docs: - `docs/docs/integrations/tools/valthera` - `https://github.com/valthera/langchain-valthera/tree/main/tests` ``` --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-03-09 21:19:08 +00:00
ccurme	3823daa0b9	cli: update integration doc template for tools (#30188 ) Chain example -> langgraph agent	2025-03-09 21:14:43 +00:00
Jonathan Feng	911accf733	docs: add contextualai documentation (#30050 ) Thank you for contributing to LangChain! Description: adds ContextualAI's `langchain-contextual` package's documentation If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-03-09 02:43:13 +00:00
Bharat	b9746a6910	fixes#30182: update tool names to match OpenAI function name pattern (#30183 ) The OpenAI API requires function names to match the pattern '^[a-zA-Z0-9_-]+$'. This updates the JIRA toolkit's tool names to use underscores instead of spaces to comply with this requirement and prevent BadRequestError when using the tools with OpenAI functions. Error fixed: ``` File "langgraph-bug-fix/.venv/lib/python3.13/site-packages/openai/_base_client.py", line 1023, in _request raise self._make_status_error_from_response(err.response) from None openai.BadRequestError: Error code: 400 - {'error': {'message': "Invalid 'tools[0].function.name': string does not match pattern. Expected a string that matches the pattern '^[a-zA-Z0-9_-]+$'.", 'type': 'invalid_request_error', 'param': 'tools[0].function.name', 'code': 'invalid_value'}} During task with name 'agent' and id 'aedd7537-e8d5-6678-d0c5-98129586d3ac' ``` Issue:#30182	2025-03-08 20:48:25 -05:00
ccurme	cee0fecb08	docs: update package registry counts (#30181 )	2025-03-08 20:37:59 -05:00
William FH	bac3a28e70	Flush (#30157 )	2025-03-07 16:32:15 -08:00
ccurme	a7ab5e8372	community[patch]: ChatPerplexity: track usage metadata (#30175 )	2025-03-07 23:25:05 +00:00
ccurme	1c993b921c	core[patch]: release 0.3.43 (#30173 )	2025-03-07 21:56:00 +00:00
ccurme	9893e5cb80	core[patch]: catch structured_output_format (#30172 ) Change to `ls_structured_output_format` was not backward-compatible with older versions of integration packages.	2025-03-07 16:50:06 -05:00
ccurme	33a3510243	core[patch]: export ArgsSchema (#30169 ) This is needed for type hints see: https://github.com/langchain-ai/langchain/pull/30167	2025-03-07 20:43:05 +00:00
ccurme	17507c9ba6	groq[patch]: release 0.2.5 (#30168 )	2025-03-07 20:25:51 +00:00
andyzhou1982	9e863c89d2	add JiebaLinkExtractor for chinese doc extracting (#30150 ) Thank you for contributing to LangChain! - [ ] PR title: "community: chinese doc extracting" - [ ] PR message: - Description: add jieba_link_extractor.py for chinese doc extracting - Dependencies: jieba - [ ] Add tests and docs: If you're adding a new integration, please include /doc/doc/integrations/providers/jieba.md /doc/doc/integrations/vectorstores/jieba_link_extractor.ipynb /libs/packages.yml --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-03-07 20:21:46 +00:00
ccurme	74e7772a5f	groq[patch]: warn if model is not specified (#30161 ) Groq is retiring `mixtral-8x7b-32768`, which is currently the default model for ChatGroq, on March 20. Here we emit a warning if the model is not specified explicitly. A version 0.3.0 will be released ahead of March 20 that removes the default altogether.	2025-03-07 15:21:13 -05:00
Ioannis Bakagiannis	3444e587ee	docs: Integration Update - ADS4GPTs (#30153 ) docs: New integration for LangChain - ads4gpts-langchain Description: Tools and Toolkit for Agentic integration natively within LangChain with ADS4GPTs, in order to help applications monetize with advertising. Twitter handle: @ads4gpts Co-authored-by: knitlydevaccount <loom+github@knitly.app>	2025-03-07 14:35:44 -05:00
ccurme	3c258194ae	tests[patch]: release 0.3.14 (#30165 )	2025-03-07 18:34:05 +00:00
ccurme	34638ccfae	openai[patch]: release 0.3.8 (#30164 )	2025-03-07 18:26:40 +00:00
ccurme	4e5058f29c	core[patch]: release 0.3.42 (#30163 )	2025-03-07 18:14:45 +00:00
Eugene Yurtsev	894fd63a61	cli: release 0.0.36 (#30159 ) Bump for 0.0.36	2025-03-07 13:05:40 -05:00
ccurme	806211475a	core[patch]: update structured output tracing (#30123 ) - Trace JSON schema in `options` - Rename to `ls_structured_output_format`	2025-03-07 13:05:25 -05:00
ccurme	230876a7c5	anthropic[patch]: add PDF input example to API reference (#30156 )	2025-03-07 14:19:08 +00:00
joeconstantino	022ff9eead	Tableau docs for new datasource qa tool (#30125 ) - Description: a notebook showing langchain and langraph agents using the new langchain_tableau tool - Twitter handle: @joe_constantin0 --------- Co-authored-by: Joe Constantino <joe@constantino.com> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-03-06 14:58:56 +00:00
ccurme	52b0570bec	core, openai, standard-tests: improve OpenAI compatibility with Anthropic content blocks (#30128 ) - Support thinking blocks in core's `convert_to_openai_messages` (pass through instead of error) - Ignore thinking blocks in ChatOpenAI (instead of error) - Support Anthropic-style image blocks in ChatOpenAI --- Standard integration tests include a `supports_anthropic_inputs` property which is currently enabled only for tests on `ChatAnthropic`. This test enforces compatibility with message histories of the form: ``` - system message - human message - AI message with tool calls specified only through `tool_use` content blocks - human message containing `tool_result` and an additional `text` block ``` It additionally checks support for Anthropic-style image inputs if `supports_image_inputs` is enabled. Here we change this test, such that if you enable `supports_anthropic_inputs`: - You support AI messages with text and `tool_use` content blocks - You support Anthropic-style image inputs (if `supports_image_inputs` is enabled) - You support thinking content blocks. That is, we add a test case for thinking content blocks, but we also remove the requirement of handling tool results within HumanMessages (motivated by existing agent abstractions, which should all return ToolMessage). We move that requirement to a ChatAnthropic-specific test.	2025-03-06 09:53:14 -05:00
Pat Patterson	b3dc66f7a3	community: fix AttributeError when creating LanceDB vectorstore (#30127 ) Description: This PR adds a call to `guard_import()` to fix an AttributeError raised when creating LanceDB vectorstore instance with an existing LanceDB table. Issue: This PR fixes issue #30124. Dependencies: No additional dependencies. Twitter handle: [@metadaddy](https://x.com/metadaddy), but I spend more time at [@metadaddy.net](https://bsky.app/profile/metadaddy.net) these days. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-03-05 23:04:38 +00:00
Hugh Gao	9b7b8e4a1a	community: make DashScope models support Partial Mode for text continuation. (#30108 ) ## Description make DashScope models support Partial Mode for text continuation. For text continuation in ChatTongYi, it supports text continuation with a prefix by adding a "partial" argument in AIMessage. The document is [Partial Mode ](https://help.aliyun.com/zh/model-studio/user-guide/partial-mode?spm=a2c4g.11186623.help-menu-2400256.d_1_0_0_8.211e5b77KMH5Pn&scm=20140722.H_2862210._.OR_help-T_cn~zh-V_1). The API example is: ```py import os import dashscope messages = [{ "role": "user", "content": "请对“春天来了，大地”这句话进行续写，来表达春天的美好和作者的喜悦之情" }, { "role": "assistant", "content": "春天来了，大地", "partial": True }] response = dashscope.Generation.call( api_key=os.getenv("DASHSCOPE_API_KEY"), model='qwen-plus', messages=messages, result_format='message', ) print(response.output.choices[0].message.content) ``` --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-03-05 16:22:14 +00:00
黑牛	f0153414d5	Add request_id field to improve request tracking and debugging (for Tongyi model) (#30110 ) - Description: Added the request_id field to the check_response function to improve request tracking and debugging, applicable for the Tongyi model. - Issue: None - Dependencies: None - Twitter handle: None - Add tests and docs: None - Lint and test: Ran `make format`, `make lint`, and `make test` to ensure the code meets formatting and testing requirements.	2025-03-05 11:03:47 -05:00
Manthan Surkar	1ee8aceaee	community: fix Jira API wrapper failing initialization with cloud param (#30117 ) ### Description Converts the boolean `jira_cloud` parameter in the Jira API Wrapper to a string before initializing the Jira Client. Also adds tests for the same. ### Issue [Jira API Wrapper Bug](`8abb65e138/libs/community/langchain_community/utilities/jira.py (L47)`) ```python jira_cloud_str = get_from_dict_or_env(values, "jira_cloud", "JIRA_CLOUD") jira_cloud = jira_cloud_str.lower() == "true" ``` The above code has a bug where the value of `"jira_cloud"` is a boolean. If it is passed, calling `.lower()` on a boolean raises an error. Additionally, `False` cannot be passed explicitly since `get_from_dict_or_env` falls back to environment variables. Relevant code in `langchain_core`: [Source](https://github.com/thesmallstar/langchain/blob/master/.venv/lib/python3.13/site-packages/langchain_core/utils/env.py#L46) ```python if isinstance(key, str) and key in data and data[key]: # Here, data[key] is False ``` This PR fixes both issues. ### Twitter Handle [Manthan Surkar](https://x.com/manthan_surkar)	2025-03-05 10:49:25 -05:00
Adrián Panella	c599ba47d5	core(mermaid): fix error when 3+ subgraph levels (#29970 )	2025-03-04 13:27:49 -05:00
Alexander Henlein	417efa30a6	docs: add Taiga Tool integration docs (#30042 ) This PR adds documentation for the langchain-taiga Tool integration, including an example notebook at 'docs/docs/integrations/tools/taiga.ipynb' and updates to 'libs/packages.yml' to track the new package. Issue: N/A Dependencies: None Twitter handle: N/A --------- Co-authored-by: ccurme <chester.curme@gmail.com>	2025-03-04 17:51:20 +00:00
Mathias Marciano	5f0102242a	Fixed an issue with the OpenAI Assistant's 'retrieval' tool and adding support for the 'attachments' parameter (#30006 ) PR Title: langchain: add attachments support in OpenAIAssistantRunnable PR Description: This PR fixes an issue with the "retrieval" tool (internally named "file_search") in the OpenAI Assistant by adding support for the "attachments" parameter in the invoke method. This change allows files to be linked to messages when they are inserted into threads, which is essential for utilizing OpenAI's Retrieval Augmented Generation (RAG) feature. Issue: N/A Dependencies: None Twitter handle: N/A --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-03-04 17:34:11 +00:00
Philippe PRADOS	4710c1fa8c	community[minor]: Fix regular expression in visualize and outlines modules. (#30002 ) Fix invalid escape characteres	2025-03-04 12:23:48 -05:00
ccurme	577c0d0715	community[patch]: release 0.3.19 (#30104 )	2025-03-04 16:12:03 +00:00
ccurme	ba5ddb218f	anthropic[patch]: release 0.3.9 (#30103 )	2025-03-04 10:53:55 -05:00
ccurme	9383a0536a	tests[patch]: release 0.3.13 (#30102 )	2025-03-04 10:53:43 -05:00
ccurme	fb16c25920	langchain[patch]: release 0.3.20 (#30101 )	2025-03-04 15:47:27 +00:00
ccurme	692a68bf1c	core[patch]: release 0.3.41 (#30100 )	2025-03-04 15:08:57 +00:00
ccurme	484d945500	community[patch]: remove numpy cap for python < 3.12 (#30084 )	2025-03-04 09:46:41 -05:00
ZhangShenao	8575d7491f	[Doc] Improve api doc (#30073 ) - Update api_doc for `BaseMessage` - add static method decorator for `retry_runnable`	2025-03-04 09:39:07 -05:00
Samuel Dion-Girardeau	ccb64e9f4f	docs: Fix typo in code samples for max_tokens_for_prompt (#30088 ) - Description: Fix typo in code samples for max_tokens_for_prompt. Code blocks had singular "token" but the method has plural "tokens". - Issue: N/A - Dependencies: N/A - Twitter handle: N/A	2025-03-04 09:11:21 -05:00
ArrayPD	c671d54c6f	core: make with_alisteners() example workable. (#30059 ) Description: 5 fix of example from function with_alisteners() in libs/core/langchain_core/runnables/base.py Replace incoherent example output with workable example's output. 1. SyntaxError: unterminated string literal print(f"on start callback starts at {format_t(time.time())} correct as print(f"on start callback starts at {format_t(time.time())}") 2. SyntaxError: unterminated string literal print(f"on end callback starts at {format_t(time.time())} correct as print(f"on end callback starts at {format_t(time.time())}") 3. NameError: name 'Runnable' is not defined Fix as from langchain_core.runnables import Runnable 4. NameError: name 'asyncio' is not defined Fix as import asyncio 5. NameError: name 'format_t' is not defined. Implement format_t() as from datetime import datetime, timezone def format_t(timestamp: float) -> str: return datetime.fromtimestamp(timestamp, tz=timezone.utc).isoformat()	2025-03-01 15:39:02 -05:00
cold-eye	7c175e3fda	Update ascend.py (#30060 ) add batch_size to fix oom when embed large amount texts Thank you for contributing to LangChain! - [ ] PR title: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.	2025-03-01 14:10:41 -05:00
ccurme	3b066dc005	anthropic[patch]: allow structured output when thinking is enabled (#30047 ) Structured output will currently always raise a BadRequestError when Claude 3.7 Sonnet's `thinking` is enabled, because we rely on forced tool use for structured output and this feature is not supported when `thinking` is enabled. Here we: - Emit a warning if `with_structured_output` is called when `thinking` is enabled. - Raise `OutputParserException` if no tool calls are generated. This is arguably preferable to raising an error in all cases. ```python from langchain_anthropic import ChatAnthropic from pydantic import BaseModel class Person(BaseModel): name: str age: int llm = ChatAnthropic( model="claude-3-7-sonnet-latest", max_tokens=5_000, thinking={"type": "enabled", "budget_tokens": 2_000}, ) structured_llm = llm.with_structured_output(Person) # <-- this generates a warning ``` ```python structured_llm.invoke("Alice is 30.") # <-- works ``` ```python structured_llm.invoke("Hello!") # <-- raises OutputParserException ```	2025-02-28 14:44:11 -05:00
ccurme	f8ed5007ea	anthropic, mistral: return `model_name` in response metadata (#30048 ) Took a "census" of models supported by init_chat_model-- of those that return model names in response metadata, these were the only two that had it keyed under `"model"` instead of `"model_name"`.	2025-02-28 18:56:05 +00:00
Christophe Bornet	9e6ffd1264	core: Add ruff rules PTH (pathlib) (#29338 ) See https://docs.astral.sh/ruff/rules/#flake8-use-pathlib-pth Co-authored-by: ccurme <chester.curme@gmail.com>	2025-02-28 13:22:20 -05:00
TheSongg	86b364de3b	Add asynchronous generate interface (#30001 ) - [ ] PR title: [langchain_community.llms.xinference]: Add asynchronous generate interface - [ ] PR message: The asynchronous generate interface support stream data and non-stream data. chain = prompt \| llm async for chunk in chain.astream(input=user_input): yield chunk - [ ] Add tests and docs: from langchain_community.llms import Xinference from langchain.prompts import PromptTemplate llm = Xinference( server_url="http://0.0.0.0:9997", # replace your xinference server url model_uid={model_uid} # replace model_uid with the model UID return from launching the model stream = True ) prompt = PromptTemplate(input=['country'], template="Q: where can we visit in the capital of {country}? A:") chain = prompt \| llm async for chunk in chain.astream(input=user_input): yield chunk	2025-02-28 12:32:44 -05:00
Fakai Zhao	f07338d2bf	Implementing the MMR algorithm for OLAP vector storage (#30033 ) Thank you for contributing to LangChain! - Implementing the MMR algorithm for OLAP vector storage: - Support Apache Doris and StarRocks OLAP database. - Example: "vectorstore.as_retriever(search_type="mmr", search_kwargs={"k": 10})" - Implementing the MMR algorithm for OLAP vector storage: - Apache Doris - StarRocks - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - Add tests and docs: - Example: "vectorstore.as_retriever(search_type="mmr", search_kwargs={"k": 10})" - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: fakzhao <fakzhao@cisco.com>	2025-02-28 08:50:22 -05:00
Daniel Rauber	186cd7f1a1	community: PlaywrightURLLoader should wait for page load event before attempting to extract data (#30043 ) ## Description The PlaywrightURLLoader should wait for a page to be loaded before attempting to extract data.	2025-02-28 08:45:51 -05:00
ccurme	0dbcc1d099	docs: document anthropic features (#30030 ) Update integrations page with extended thinking feature. Update API reference with extended thinking and citations.	2025-02-27 19:37:04 -05:00
ccurme	6c7c8a164f	openai[patch]: add unit test (#30022 ) Test `max_completion_tokens` is propagated to payload for AzureChatOpenAI.	2025-02-27 11:09:17 -05:00
DamonXue	156a60013a	docs: fix tavily_search code-block format. (#30012 ) This pull request includes a change to the `TavilySearchResults` class in the `tool.py` file, which updates the code block format in the documentation. Documentation update: * [`libs/community/langchain_community/tools/tavily_search/tool.py`](diffhunk://#diff-e3b6a980979268b639c6a86e9b182756b0f7c7e9e5605e613bc0a72ea6aa5301L54-R59): Changed the code block format from Python to JSON in the example provided in the docstring.Thank you for contributing to LangChain!	2025-02-27 10:55:15 -05:00
kawamou	8977ac5ab0	community[fix]: Handle None value in raw_content from Tavily API response (#30021 ) ## Description: When using the Tavily retriever with include_raw_content=True, the retriever occasionally fails with a Pydantic ValidationError because raw_content can be None. The Document model in langchain_core/documents/base.py requires page_content to be a non-None value, but the Tavily API sometimes returns None for raw_content. This PR fixes the issue by ensuring that even when raw_content is None, an empty string is used instead: ```python page_content=result.get("content", "") if not self.include_raw_content else (result.get("raw_content") or ""),	2025-02-27 10:53:53 -05:00
Lakindu Boteju	f69deee1bd	community: Add cost data for aws bedrock anthropic.claude-3-7 model (#30016 ) This pull request includes updates to the `libs/community/langchain_community/callbacks/bedrock_anthropic_callback.py` file to add a new model version to the list of supported models. Updates to supported models: * Added support for the `anthropic.claude-3-7-sonnet-20250219-v1:0` model with a rate of `0.003` for 1000 input tokens. * Added support for the `anthropic.claude-3-7-sonnet-20250219-v1:0` model with a rate of `0.015` for 1000 output tokens. AWS Bedrock pricing reference : https://aws.amazon.com/bedrock/pricing	2025-02-27 09:51:52 -05:00
Lakindu Boteju	e0e9e560b3	PyMuPDF4LLM integration to LangChain (#29953 ) ## PyMuPDF4LLM integration to LangChain for PDF content extraction in Markdown format ### Description [PyMuPDF4LLM](https://github.com/pymupdf/RAG) makes it easier to extract PDF content in Markdown format, needed for LLM & RAG applications. (License: GNU Affero General Public License v3.0) [langchain-pymupdf4llm](https://github.com/lakinduboteju/langchain-pymupdf4llm) integrates PyMuPDF4LLM to LangChain as a Document Loader. (License: MIT License) This pull request introduces the integration of [PyMuPDF4LLM](https://pymupdf.readthedocs.io/en/latest/pymupdf4llm) into the LangChain project as an integration package: [`langchain-pymupdf4llm`](https://github.com/lakinduboteju/langchain-pymupdf4llm). The most important changes include adding new Jupyter notebooks to document the integration and updating the package configuration file to include the new package. ### Documentation: * `docs/docs/integrations/providers/pymupdf4llm.ipynb`: Added a new Jupyter notebook to document the integration of `PyMuPDF4LLM` with LangChain, including installation instructions and class imports. * `docs/docs/integrations/document_loaders/pymupdf4llm.ipynb`: Added a new Jupyter notebook to document the usage of `langchain-pymupdf4llm` as a LangChain integration package in detail. ### Package registration: * `libs/packages.yml`: Updated the package configuration file to include the `langchain-pymupdf4llm` package. ### Additional information * Related to: https://github.com/langchain-ai/langchain/pull/29848 --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-02-26 15:59:12 -05:00
Dan Mirsky	d98c3f76c2	core[patch]: Fix FileCallbackHandler name resolution, Fixes #29941 (#29942 ) - Description: Same changes as #26593 but for FileCallbackHandler - Issue: Fixes #29941 - Dependencies: None - Twitter handle: None - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/	2025-02-26 14:54:24 -05:00
Christophe Bornet	b3885c124f	core: Add ruff rules TC (#29268 ) See https://docs.astral.sh/ruff/rules/#flake8-type-checking-tc Some fixes done for TC001,TC002 and TC003 but these rules are excluded since they don't play well with Pydantic. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-02-26 19:39:05 +00:00
talos	9cd20080fc	community: Update SQLiteVec table trigger (#29914 ) Issue: This trigger can only be used by the first table created. Cannot create additional triggers for other tables. fixed: Update the trigger name so that it can be used for new tables. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-02-26 15:10:13 +00:00
ccurme	7562677f3f	langchain[patch]: delete erroneous lock file (#30007 ) Picked up during merge.	2025-02-26 15:01:05 +00:00
Erick Friis	3c96012f5e	langchain: make numpy optional (#29182 ) Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-02-26 14:35:24 +00:00
Artem Yankov	6177b9f9ab	community: add title, score and raw_content to tavily search results (#29995 ) Description: Tavily search results returned from API include useful information like title, score and (optionally) raw_content that is missed in wrapper although it's documented there properly. Add this data to the result structure. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-02-25 23:27:21 +00:00
Eugene Yurtsev	b525226531	core[patch]: version 0.3.40 (#29997 ) Version 0.3.40 release	2025-02-25 23:09:40 +00:00
Vadym Barda	0fc50b82a0	core[patch]: allow passing description to @tool decorator (#29976 )	2025-02-25 17:45:36 -05:00
Naveen SK	21bfc95e14	docs: Correct grammatical typos in various documentation files (#29983 ) Description: Fixed grammatical typos in various documentation files Issue: N/A Dependencies: N/A Twitter handle: @MrNaveenSK Co-authored-by: ccurme <chester.curme@gmail.com>	2025-02-25 19:13:31 +00:00
ccurme	1158d3134d	langchain[patch]: remove aiohttp (#29991 ) My guess is this was left over from when `community` was in langchain.	2025-02-25 11:43:00 -05:00
ccurme	afd7888392	langchain[patch]: remove explicit dependency on tenacity (#29990 ) Not used anywhere in `langchain`, already a dependency of langchain-core.	2025-02-25 11:31:55 -05:00
ccurme	32704f0ad8	langchain: update extended test (#29988 )	2025-02-25 14:58:20 +00:00
Yan	47e1a384f7	Writer partners integration docs (#29961 ) Documentation of Writer provider and additional features * [PyPi langchain-writer web-page](https://pypi.org/project/langchain-writer/) * [GitHub langchain-writer repo](https://github.com/writer/langchain-writer) --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-02-24 19:30:09 -05:00
ccurme	79f5bbfb26	anthropic[patch]: release 0.3.8 (#29973 )	2025-02-24 15:24:35 -05:00
ccurme	ded886f622	anthropic[patch]: support claude 3.7 sonnet (#29971 )	2025-02-24 15:17:47 -05:00
Bagatur	d00d645829	docs[patch]: update disable_streaming docstring (#29968 )	2025-02-24 18:40:31 +00:00
ccurme	b7a1705052	openai[patch]: release 0.3.7 (#29967 )	2025-02-24 11:59:28 -05:00
ccurme	5437ee385b	core[patch]: release 0.3.39 (#29966 )	2025-02-24 11:47:01 -05:00
ccurme	291a232fb8	openai[patch]: set global ssl context (#29932 ) We set ```python global_ssl_context = ssl.create_default_context(cafile=certifi.where()) ``` at the module-level and share it among httpx clients.	2025-02-24 11:25:16 -05:00
ccurme	9ce07980b7	core[patch]: pydantic 2.11 compat (#29963 ) Resolves https://github.com/langchain-ai/langchain/issues/29951 Was able to reproduce the issue with Anthropic installing from pydantic `main` and correct it with the fix recommended in the issue. Thanks very much @Viicos for finding the bug and the detailed writeup!	2025-02-24 11:11:25 -05:00
ccurme	0d3a3b99fc	core[patch]: release 0.3.38 (#29962 )	2025-02-24 15:04:53 +00:00
ccurme	b1a7f4e106	core, openai[patch]: support serialization of pydantic models in messages (#29940 ) Resolves https://github.com/langchain-ai/langchain/issues/29003, https://github.com/langchain-ai/langchain/issues/27264 Related: https://github.com/langchain-ai/langchain-redis/issues/52 ```python from langchain.chat_models import init_chat_model from langchain.globals import set_llm_cache from langchain_community.cache import SQLiteCache from pydantic import BaseModel cache = SQLiteCache() set_llm_cache(cache) class Temperature(BaseModel): value: int city: str llm = init_chat_model("openai:gpt-4o-mini") structured_llm = llm.with_structured_output(Temperature) ``` ```python # 681 ms response = structured_llm.invoke("What is the average temperature of Rome in May?") ``` ```python # 6.98 ms response = structured_llm.invoke("What is the average temperature of Rome in May?") ```	2025-02-24 09:34:27 -05:00
ccurme	927ec20b69	openai[patch]: update system role to developer for o-series models (#29785 ) Some o-series models will raise a 400 error for `"role": "system"` (`o1-mini` and `o1-preview` will raise, `o1` and `o3-mini` will not). Here we update `ChatOpenAI` to update the role to `"developer"` for all model names matching `^o\d`. We only make this change on the ChatOpenAI class (not BaseChatOpenAI).	2025-02-24 08:59:46 -05:00
Ahmed Tammaa	8b511a3a78	[Exception Handling] DeepSeek JSONDecodeError (#29758 ) For Context please check #29626 The Deepseek is using langchain_openai. The error happens that it show `json decode error`. I added a handler for this to give a more sensible error message which is DeepSeek API returned empty/invalid json. Reproducing the issue is a bit challenging as it is inconsistent, sometimes DeepSeek returns valid data and in other times it returns invalid data which triggers the JSON Decode Error. This PR is an exception handling, but not an ultimate fix for the issue. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-02-23 15:00:32 -05:00
Julien Elkaim	e586bffe51	community: Repair embeddings/llamacpp's embed_query method (#29935 ) Description: As commented on the commit [`41b6a86`](`41b6a86bbe`) it introduced a bug for when we do an embedding request and the model returns a non-nested list. Typically it's the case for model _nomic-embed-text_. - I added the unit test, and ran `make format`, `make lint` and `make test` from the `community` package. - No new dependency. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-02-23 19:32:17 +00:00
Saraswathy Kalaiselvan	5ca4933b9d	docs: updated ChatLiteLLM model_kwargs description (#29937 ) - [x] PR title: docs: (community) update ChatLiteLLM - [x] PR message: - Description: updated description of model_kwargs parameter which was wrongly describing for temperature. - Issue: #29862 - Dependencies: N/A - [x] Add tests and docs: N/A - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-02-23 19:27:13 +00:00
ccurme	512eb1b764	anthropic[patch]: update models for integration tests (#29938 )	2025-02-23 14:23:48 -05:00
Christophe Bornet	f6d4fec4d5	core: Add ruff rules ANN (type annotations) (#29271 ) See https://docs.astral.sh/ruff/rules/#flake8-annotations-ann The interest compared to only mypy is that ruff is very fast at detecting missing annotations. ANN101 and ANN102 are deprecated so we ignore them ANN401 (no Any type) ignored to be in sync with mypy config --------- Co-authored-by: ccurme <chester.curme@gmail.com>	2025-02-22 17:46:28 -05:00
Bagatur	979a991dc2	core[patch]: dont deep copy merge_message_runs (#28454 ) afaict no need to deep copy here, if we merge messages then we convert them to chunks first anyways	2025-02-22 21:56:45 +00:00
Mohammad Mohtashim	afa94e5bf7	`_wait_for_run` calling fix for `OpenAIAssistantRunnable` (#29927 ) - Description: Fixed the `OpenAIAssistantRunnable` call of `_wait_for_run` - Issue: #29923	2025-02-22 00:27:24 +00:00
Vadym Barda	437fe6d216	core[patch]: return ToolMessage from tools when tool call ID is empty string (#29921 )	2025-02-21 11:53:15 -05:00
Taofiq Aiyelabegan	5ee8a8f063	[Integration]: Langchain-Permit (#29867 ) ## Which area of LangChain is being modified? - This PR adds a new "Permit" integration to the `docs/integrations/` folder. - Introduces two new Tools (`LangchainJWTValidationTool` and `LangchainPermissionsCheckTool`) - Introduces two new Retrievers (`PermitSelfQueryRetriever` and `PermitEnsembleRetriever`) - Adds demo scripts in `examples/` showcasing usage. ## Description of Changes - Created `langchain_permit/tools.py` for JWT validation and permission checks with Permit. - Created `langchain_permit/retrievers.py` for custom Permit-based retrievers. - Added documentation in `docs/integrations/providers/permit.ipynb` (or `.mdx`) to explain setup, usage, and examples. - Provided sample scripts in `examples/demo_scripts/` to illustrate usage of these tools and retrievers. - Ensured all code is linted and tested locally. Thank you again for reviewing! --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-02-21 10:59:00 -05:00
Jean-Philippe Dournel	ebe38baaf9	community/mlx_pipeline: fix crash at mlx call (#29915 ) - Description: Since mlx_lm 0.20, all calls to mlx crash due to deprecation of the way parameters are passed to methods generate and generate_step. Parameters top_p, temp, repetition_penalty and repetition_context_size are not passed directly to those method anymore but wrapped into "sampler" and "logit_processor". - Dependencies: mlx_lm (optional) - Tests: I've had a new test to existing test file: tests/integration_tests/llms/test_mlx_pipeline.py --------- Co-authored-by: Jean-Philippe Dournel <jp@insightkeeper.io>	2025-02-21 09:14:53 -05:00
ccurme	1fa9f6bc20	docs: build mongo in api ref (#29908 )	2025-02-20 19:58:35 -05:00
Chaunte W. Lacewell	d972c6d6ea	partners: add langchain-vdms (#29857 ) Description: Deprecate vdms in community, add integration langchain-vdms, and update any related files Issue: n/a Dependencies: langchain-vdms Twitter handle: n/a --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-02-20 19:48:46 -05:00
Mohammad Mohtashim	8293142fa0	mistral[patch]: support model_kwargs (#29838 ) - Description: Frequency_penalty added as a client parameter - Issue: #29803 --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-02-20 18:47:39 -05:00
ccurme	924d9b1b33	cli[patch]: fix retriever template (#29907 ) Chat model tabs don't render correctly in .ipynb template.	2025-02-20 17:51:19 +00:00
Brayden Zhong	a70f31de5f	Community: RankLLMRerank AttributeError (Handle list-based rerank results) (#29840 ) # community: Fix AttributeError in RankLLMRerank (`list` object has no attribute `candidates`) ## Description This PR fixes an issue in `RankLLMRerank` where reranking fails with the following error: ``` AttributeError: 'list' object has no attribute 'candidates' ``` The issue arises because `rerank_batch()` returns a `List[Result]` instead of an object containing `.candidates`. ### Changes Introduced - Adjusted `compress_documents()` to support both: - Old API format: `rerank_results.candidates` - New API format: `rerank_results` as a list - Also fix wrong .txt location parsing while I was at it. --- ## Issue Fixes AttributeError in `RankLLMRerank` when using `compression_retriever.invoke()`. The issue is observed when `rerank_batch()` returns a list instead of an object with `.candidates`. Relevant log: ``` AttributeError: 'list' object has no attribute 'candidates' ``` ## Dependencies - No additional dependencies introduced. --- ## Checklist - [x] Backward compatible with previous API versions - [x] Tested locally with different RankLLM models - [x] No new dependencies introduced - [x] Linted with `make format && make lint` - [x] Ready for review --- ## Testing - Ran `compression_retriever.invoke(query)` ## Reviewers If no review within a few days, please @mention one of: - @baskaryan - @efriis - @eyurtsev - @ccurme - @vbarda - @hwchase17	2025-02-20 12:38:31 -05:00
Levon Ghukasyan	ec403c442a	Separate deepale vector store (#29902 ) Thank you for contributing to LangChain! - [ ] PR title: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-02-20 17:37:19 +00:00
Jorge Piedrahita Ortiz	3acf842e35	core: add sambanova chat models to load module mapping (#29855 ) - Description: add sambanova integration package chat models to load module mapping, to allow serialization and deserialization	2025-02-20 12:30:50 -05:00
ccurme	d227e4a08e	mistralai[patch]: release 0.2.7 (#29906 )	2025-02-20 17:27:12 +00:00
Hande	d8bab89e6e	community: add cognee retriever (#29878 ) This PR adds a new cognee integration, knowledge graph based retrieval enabling developers to ingest documents into cognee’s knowledge graph, process them, and then retrieve context via CogneeRetriever. It includes: - langchain_cognee package with a CogneeRetriever class - a test for the integration, demonstrating how to create, process, and retrieve with cognee - an example notebook showing its use. It lives in `docs/docs/integrations` directory. Followed additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. Thank you for the review! --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-02-20 17:15:23 +00:00
dokato	92b415a9f6	community: Made some Jira fields optional for agent to work correctly (#29876 ) Description: Two small changes have been proposed here: (1) Previous code assumes that every issue has a priority field. If an issue lacks this field, the code will raise a KeyError. Now, the code checks if priority exists before accessing it. If priority is missing, it assigns None instead of crashing. This prevents runtime errors when processing issues without a priority. (2) Also If the "style" field is missing, the code throws a KeyError. `.get("style", None)` safely retrieves the value if present. Issue: #29875 Dependencies: N/A	2025-02-20 12:10:11 -05:00
am-kinetica	ca7eccba1f	Handled a bug around empty query results differently (#29877 ) Thank you for contributing to LangChain! - [ ] Handled query records properly: "community: vectorstores/kinetica" - [ ] Bugfix for empty query results handling: - Description: checked for the number of records returned by a query before processing further - Issue: resulted in an `AttributeError` earlier which has now been fixed @efriis	2025-02-20 12:07:49 -05:00
Antonio Pisani	2c403a3ea9	docs: Add langchain-prolog documentation (#29788 ) I want to add documentation for a new integration with SWI-Prolog. @hwchase17 check this out: https://github.com/apisani1/langchain-prolog/tree/main/examples/travel_agent --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-02-20 11:50:28 -05:00
Marlene	be7fa920fa	Partner: Azure AI Langchain Docs and Package Registry (#29879 ) This PR adds documentation for the Azure AI package in Langchain to the main mono-repo No issue connected or updated dependencies. Utilises existing tests and makes updates to the docs --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-02-20 14:35:26 +00:00
Hankyeol Kyung	2dd0ce3077	openai: Update reasoning_effort arg documentation (#29897 ) Description: Update docstring for `reasoning_effort` argument to specify that it applies to reasoning models only (e.g., OpenAI o1 and o3-mini), clarifying its supported models. Issue: None Dependencies: None	2025-02-20 09:03:42 -05:00
ccurme	ed3c2bd557	core[patch]: set version="v2" as default in astream_events (#29894 )	2025-02-19 23:21:37 +00:00
Fabian Blatz	a2d05a376c	community: ConfluenceLoader: add a filter method for attachments (#29882 ) Adds a `attachment_filter_func` parameter to the ConfluenceLoader class which can be used to determine which files are indexed. This is useful if you are interested in excluding files based on their media type or other metadata.	2025-02-19 18:20:45 -05:00
ccurme	9ed47a4d63	community[patch]: release 0.3.18 (#29896 )	2025-02-19 20:13:00 +00:00
ccurme	92889edafd	core[patch]: release 0.3.37 (#29895 )	2025-02-19 20:04:35 +00:00
ccurme	ffd6194060	core[patch]: de-beta rate limiters (#29891 )	2025-02-19 19:19:59 +00:00
ccurme	fb4c8423f0	docs: fix builds (#29890 ) Missed in https://github.com/langchain-ai/langchain/pull/29889	2025-02-19 13:35:59 -05:00
ccurme	68b13e5172	pinecone: delete from monorepo (#29889 ) This now lives in https://github.com/langchain-ai/langchain-pinecone	2025-02-19 12:55:15 -05:00
Erick Friis	6c1e21d128	core: basemessage.text() (#29078 )	2025-02-18 17:45:44 -08:00
Eugene Yurtsev	8e5074d82d	core: release 0.3.36 (#29869 ) Release 0.3.36	2025-02-18 19:51:43 +00:00
Vadym Barda	d04fa1ae50	core[patch]: allow passing JSON schema as args_schema to tools (#29812 )	2025-02-18 14:44:31 -05:00
ccurme	5034a8dc5c	xai[patch]: release 0.2.1 (#29854 )	2025-02-17 14:30:41 -05:00
ccurme	83dcef234d	xai[patch]: support dedicated structured output feature (#29853 ) https://docs.x.ai/docs/guides/structured-outputs Interface appears identical to OpenAI's. ```python from langchain.chat_models import init_chat_model from pydantic import BaseModel class Joke(BaseModel): setup: str punchline: str llm = init_chat_model("xai:grok-2").with_structured_output( Joke, method="json_schema" ) llm.invoke("Tell me a joke about cats.") ```	2025-02-17 14:19:51 -05:00
ccurme	9d6fcd0bfb	infra: add xai to scheduled testing (#29852 )	2025-02-17 18:59:45 +00:00
ccurme	8a3b05ae69	langchain[patch]: release 0.3.19 (#29851 )	2025-02-17 13:36:23 -05:00
ccurme	c9061162a1	langchain[patch]: add xai to extras (#29850 )	2025-02-17 17:49:34 +00:00
Bagatur	1acf57e9bd	langchain[patch]: init_chat_model xai support (#29849 )	2025-02-17 09:45:39 -08:00
hsm207	037b129b86	weaviate: Add-deprecation-warning (#29757 ) - Description: add deprecation warning when using weaviate from langchain_community - Issue: NA - Dependencies: NA - Twitter handle: NA --------- Signed-off-by: hsm207 <hsm207@users.noreply.github.com> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-02-16 21:42:18 -05:00
Đỗ Quang Minh	cd198ac9ed	community: add custom model for OpenAIWhisperParser (#29831 ) Add `model` properties for OpenAIWhisperParser. Defaulted to `whisper-1` (previous value). Please help me update the docs and other related components of this repo.	2025-02-16 21:26:07 -05:00
Cole McIntosh	6874c9c1d0	docs: add notebook for langchain-salesforce package (#29800 ) Description: This PR adds a Jupyter notebook that explains the features, installation, and usage of the [`langchain-salesforce`](https://github.com/colesmcintosh/langchain-salesforce) package. The notebook includes: - Setup instructions for configuring Salesforce credentials - Example code demonstrating common operations such as querying, describing objects, creating, updating, and deleting records Issue: N/A Dependencies: No new dependencies are required. Tests and Docs: - Added an example notebook demonstrating the usage of the `langchain-salesforce` package, located in `docs/docs/integrations`. Lint and Test: - Ran `make format`, `make lint`, and `make test` successfully. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-02-16 08:34:57 -05:00
Jan Heimes	60f58df5b3	community: add top_k as param to Needle Retriever (#29821 ) Thank you for contributing to LangChain! - [X] PR title: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [x] PR message: This PR adds top_k as a param to the Needle Retriever. By default we use top 10. - [X] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [X] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.	2025-02-16 08:30:52 -05:00
Jesus Fernandez Bes	1dfac909d8	community: Adding IN Operator to AzureCosmosDBNoSQLVectorStore (#29805 ) - Description: I have added a new operator in the operator map with key `$in` and value `IN`, so that you can define filters using lists as values. This was already contemplated but as IN operator was not in the map they cannot be used. - Issue: Fixes #29804. - Dependencies: No extra.	2025-02-15 21:44:54 -05:00
Wahed Hemati	8901b113c3	docs: add Discord integration docs (#29822 ) This PR adds documentation for the `langchain-discord-shikenso` integration, including an example notebook at `docs/docs/integrations/tools/discord.ipynb` and updates to `libs/packages.yml` to track the new package. Issue: N/A Dependencies: None Twitter handle: N/A --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-02-15 21:43:45 -05:00
Krishna Kulkarni	a98c5f1c4b	langchain_community: add image support to DuckDuckGoSearchAPIWrapper (#29816 ) - [ ] PR title: langchain_community: add image support to DuckDuckGoSearchAPIWrapper - Description: This PR enhances the DuckDuckGoSearchAPIWrapper within the langchain_community package by introducing support for image searches. The enhancement includes: - Adding a new method _ddgs_images to handle image search queries. - Updating the run and results methods to process and return image search results appropriately. - Modifying the source parameter to accept "images" as a valid option, alongside "text" and "news". - Dependencies: No additional dependencies are required for this change.	2025-02-15 21:32:14 -05:00
Iris Liu	0d9f0b4215	docs: updates Chroma integration API ref docs (#29826 ) - Description: updates Chroma integration API ref docs - Issue: #29817 - Dependencies: N/A - Twitter handle: @irieliu Co-authored-by: “Iris <“liuirisny@gmail.com”>	2025-02-15 21:05:21 -05:00
ccurme	3fe7c07394	openai[patch]: release 0.3.6 (#29824 )	2025-02-15 13:53:35 -05:00
ccurme	65a6dce428	openai[patch]: enable streaming for o1 (#29823 ) Verified streaming works for the `o1-2024-12-17` snapshot as well.	2025-02-15 12:42:05 -05:00
Christophe Bornet	3dffee3d0b	all: Bump blockbuster version to 1.5.18 (#29806 ) Has fixes for running on Windows and non-CPython runtimes.	2025-02-14 07:55:38 -08:00
ccurme	d9a069c414	tests[patch]: release 0.3.12 (#29797 )	2025-02-13 23:57:44 +00:00
ccurme	e4f106ea62	groq[patch]: remove xfails (#29794 ) These appear to pass.	2025-02-13 15:49:50 -08:00
Erick Friis	f34e62ef42	packages: add langchain-xai (#29795 ) wasn't registered per the contribution guide: https://python.langchain.com/docs/contributing/how_to/integrations/	2025-02-13 15:36:41 -08:00
ccurme	49cc6106f7	tests[patch]: fix query for test_tool_calling_with_no_arguments (#29793 )	2025-02-13 23:15:52 +00:00
Erick Friis	1a225fad03	multiple: fix uv path deps (#29790 ) file:// format wasn't working with updates - it doesn't install as an editable dep move to tool.uv.sources with path= instead	2025-02-13 21:32:34 +00:00
Erick Friis	ff13384eb6	packages: update counts, add command (#29789 )	2025-02-13 20:45:25 +00:00
HackHuang	76d32754ff	core : update the class docs of InMemoryVectorStore in in_memory.py (#29781 ) - Description: Add the new introduction about checking `store` in in_memory.py, It’s necessary and useful for beginners. ```python Check Documents: .. code-block:: python for doc in vector_store.store.values(): print(doc) ``` --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-02-13 16:41:47 +00:00
Mohammad Mohtashim	96ad09fa2d	(Community): Added API Key for Jina Search API Wrapper (#29622 ) - Description: Simple change for adding the API Key for Jina Search API Wrapper - Issue: #29596	2025-02-12 20:12:07 -08:00
ccurme	f1c66a3040	docs: minor fix to provider table (#29771 ) Langfair renders as LangfAIr	2025-02-13 04:06:58 +00:00
Jakub Kopecký	c8cb7c25bf	docs: update apify integration (#29553 ) Description: Fixed and updated Apify integration documentation to use the new [langchain-apify](https://github.com/apify/langchain-apify) package. Twitter handle: @apify	2025-02-12 20:02:55 -08:00
ccurme	16fb1f5371	chroma[patch]: release 0.2.2 (#29769 ) Resolves https://github.com/langchain-ai/langchain/issues/29765	2025-02-13 02:39:16 +00:00
Mohammad Mohtashim	2310847c0f	(Chroma): Small Fix in `add_texts` when checking for embeddings (#29766 ) - Description: Small fix in `add_texts` to make embedding nullability is checked properly. - Issue: #29765 --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-02-13 02:26:13 +00:00
Eric Pinzur	716fd89d8e	docs: contributed `Graph RAG` Retriever integration (#29744 ) Description: This adds the `Graph RAG` Retriever integration documentation, per https://python.langchain.com/docs/contributing/how_to/integrations/. * The integration exists in this public repository: https://github.com/datastax/graph-rag * We've implemented the standard langchain tests for retrievers: https://github.com/datastax/graph-rag/blob/main/packages/langchain-graph-retriever/tests/test_langchain.py * Our integration is published to PyPi: https://pypi.org/project/langchain-graph-retriever/	2025-02-12 18:25:48 -08:00
Sunish Sheth	f42dafa809	Deprecating sql_database access for creating UC functions for agent tools (#29745 ) Thank you for contributing to LangChain! - [ ] PR title: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: ccurme <chester.curme@gmail.com>	2025-02-13 02:24:44 +00:00
Thor 雷神 Schaeff	a0970d8d7e	[WIP] chore: update ElevenLabs tool. (#29722 ) Thank you for contributing to LangChain! - [ ] PR title: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-02-13 01:54:34 +00:00
Chaymae El Aattabi	4b08a7e8e8	Fix #29759 : Use local chunk_size_ for looping in embed_documents (#29761 ) This fix ensures that the chunk size is correctly determined when processing text embeddings. Previously, the code did not properly handle cases where chunk_size was None, potentially leading to incorrect chunking behavior. Now, chunk_size_ is explicitly set to either the provided chunk_size or the default self.chunk_size, ensuring consistent chunking. This update improves reliability when processing large text inputs in batches and prevents unintended behavior when chunk_size is not specified. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-02-13 01:28:26 +00:00
Sunish Sheth	043d78d85d	Deprecate langhchain community ucfunctiontoolkit in favor for databricks_langchain (#29746 ) Thank you for contributing to LangChain! - [ ] PR title: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.	2025-02-12 15:50:35 -08:00
Hugues Chocart	e4eec9e9aa	community: add langchain-abso documentation (#29739 ) Add the documentation for the community package `langchain-abso`. It provides a new Chat Model class, that uses https://abso.ai --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2025-02-12 19:57:33 +00:00
ccurme	e61f463745	core[patch]: release 0.3.35 (#29764 )	2025-02-12 18:13:10 +00:00
Nuno Campos	fe59f2cc88	core: Fix output of convert_messages when called with BaseMessage.model_dump() (#29763 ) - additional_kwargs was being nested twice - example, response_metadata was placed inside additional_kwargs	2025-02-12 10:05:33 -08:00
Jacob Lee	f4e3e86fbb	feat(langchain): Infer o3 modelstrings passed to init_chat_model as OpenAI (#29743 )	2025-02-11 16:51:41 -08:00
Mohammad Mohtashim	9f3bcee30a	(Community): Adding Structured Support for ChatPerplexity (#29361 ) - Description: Adding Structured Support for ChatPerplexity - Issue: #29357 - This is implemented as per the Perplexity official docs: https://docs.perplexity.ai/guides/structured-outputs --------- Co-authored-by: ccurme <chester.curme@gmail.com>	2025-02-11 15:51:18 -08:00
Jawahar S	994c5465e0	feat: add support for IBM WatsonX AI chat models (#29688 ) Description: Updated init_chat_model to support Granite models deployed on IBM WatsonX Dependencies: [langchain-ibm](https://github.com/langchain-ai/langchain-ibm) Tagging @baskaryan @efriis for review when you get a chance.	2025-02-11 15:34:29 -08:00
Shailendra Mishra	c7d74eb7a3	Oraclevs integration (#29723 ) Thank you for contributing to LangChain! - [ ] PR title: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" community: langchain_community/vectorstore/oraclevs.py - [ ] PR message: *Delete this entire checklist* and replace with - Description: Refactored code to allow a connection or a connection pool. - Issue: Normally an idel connection is terminated by the server side listener at timeout. A user thus has to re-instantiate the vector store. The timeout in case of connection is not configurable. The solution is to use a connection pool where a user can specify a user defined timeout and the connections are managed by the pool. - Dependencies: None - Twitter handle: - [ ] Add tests and docs: This is not a new integration. A user can pass either a connection or a connection pool. The determination of what is passed is made at run time. Everything should work as before. - [ ] Lint and test: Already done. Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2025-02-11 14:56:55 -08:00
ccurme	42ebf6ae0c	deepseek[patch]: release 0.1.2 (#29742 )	2025-02-11 11:53:43 -08:00
ccurme	ec55553807	pinecone[patch]: release 0.2.3 (#29741 )	2025-02-11 19:27:39 +00:00
ccurme	001cf99253	pinecone[patch]: add support for python 3.13 (#29737 )	2025-02-11 11:20:21 -08:00
ccurme	ba8f752bf5	openai[patch]: release 0.3.5 (#29740 )	2025-02-11 19:20:11 +00:00
ccurme	9477f49409	openai, deepseek: make _convert_chunk_to_generation_chunk an instance method (#29731 ) 1. Make `_convert_chunk_to_generation_chunk` an instance method on BaseChatOpenAI 2. Override on ChatDeepSeek to add `"reasoning_content"` to message additional_kwargs. Resolves https://github.com/langchain-ai/langchain/issues/29513	2025-02-11 11:13:23 -08:00
ccurme	d0c2dc06d5	mongodb[patch]: fix link in readme (#29738 )	2025-02-11 18:19:59 +00:00
zzaebok	3b3d52206f	community: change wikidata rest api version from v0 to v1 (#29708 ) Description: According to the [wikidata documentation](https://www.wikidata.org/wiki/Wikidata_talk:REST_API), Wikibase REST API version 1 (stable) is released from November 11, 2024. Their guide is to use the new v1 API and, it just requires replacing v0 in the routes with v1 in almost all cases. So I replaced WIKIDATA_REST_API_URL from v0 to v1 for stable usage. Co-authored-by: ccurme <chester.curme@gmail.com>	2025-02-10 17:12:38 -08:00
ccurme	4a389ef4c6	community: fix extended testing (#29715 ) v0.3.100 of premai sdk appears to break on import: `89d9276cbf/premai/api/__init__.py (L230)`	2025-02-10 16:57:34 -08:00
Bhav Sardana	624216aa64	community:Fix for Pydantic model validator of GoogleApiYoutubeLoader (#29694 ) - Description: Community: bugfix for pedantic model validator for GoogleApiYoutubeLoader - Issue: #29165, #27432 Fix is similar to #29346	2025-02-10 08:57:58 -05:00
Changyong Um	60740c44c5	community: Add configurable text key for indexing and the retriever in Pinecone Hybrid Search (#29697 ) issue In Langchain, the original content is generally stored under the `text` key. However, the `PineconeHybridSearchRetriever` searches the `context` field in the metadata and cannot change this key. To address this, I have modified the code to allow changing the key to something other than context. In my opinion, following Langchain's conventions, the `text` key seems more appropriate than `context`. However, since I wasn't sure about the author's intent, I have left the default value as `context`.	2025-02-10 08:56:37 -05:00
manukychen	3de445d521	using getattr and default value to prevent 'OpenSearchVectorSearch' has no attribute 'bulk_size' (#29682 ) - Description: Adding getattr methods and set default value 500 to cls.bulk_size, it can prevent the error below: Error: type object 'OpenSearchVectorSearch' has no attribute 'bulk_size' - Issue: https://github.com/langchain-ai/langchain/issues/29071	2025-02-08 14:39:57 -05:00
Yao Tianjia	5d581ba22c	langchain: support the situation when action_input is null in json output_parser (#29680 ) Description: This PR fixes handling of null action_input in [langchain.agents.output_parser]. Previously, passing null to action_input could cause OutputParserException with unclear error message which cause LLM don't know how to modify the action. The changes include: Added null-check validation before processing action_input Implemented proper fallback behavior with default values Maintained backward compatibility with existing implementations Error Examples: ``` { "action":"some action", "action_input":null } ``` Issue: None Dependencies: None	2025-02-07 22:01:01 -05:00
Philippe PRADOS	beb75b2150	community[minor]: 05 - Refactoring PyPDFium2 parser (#29625 ) This is one part of a larger Pull Request (PR) that is too large to be submitted all at once. This specific part focuses on updating the PyPDFium2 parser. For more details, see https://github.com/langchain-ai/langchain/pull/28970.	2025-02-07 21:31:12 -05:00
Christophe Bornet	723031d548	community: Bump ruff version to 0.9 (#29206 ) Co-authored-by: Erick Friis <erick@langchain.dev>	2025-02-08 01:21:10 +00:00
Christophe Bornet	30f6c9f5c8	community: Use Blockbuster to detect blocking calls in asyncio during tests (#29609 ) Same as https://github.com/langchain-ai/langchain/pull/29043 for langchain-community. Dependencies: - blockbuster (test) Twitter handle: cbornet_ Co-authored-by: Erick Friis <erick@langchain.dev>	2025-02-08 01:10:39 +00:00
Christophe Bornet	3a57a28daa	langchain: Use Blockbuster to detect blocking calls in asyncio during tests (#29616 ) Same as https://github.com/langchain-ai/langchain/pull/29043 for the langchain package. Dependencies: - blockbuster (test) Twitter handle: cbornet_ --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2025-02-08 01:08:15 +00:00
Keenan Pepper	c67d473397	core: Make abatch_as_completed respect max_concurrency (#29426 ) - Description: Add tests for respecting max_concurrency and implement it for abatch_as_completed so that test passes - Issue: #29425 - Dependencies: none - Twitter handle: keenanpepper	2025-02-07 16:51:22 -08:00
Aaron V	dcfaae85d2	Core: Fix __add__ for concatting two BaseMessageChunk's (#29531 ) Description: The change allows you to use the overloaded `+` operator correctly when `+`ing two BaseMessageChunk subclasses. Without this you must instantiate a subclass for it to work. Which feels... wrong. Base classes should be decoupled from sub classes and should have in no way a dependency on them. Issue: You can't `+` a BaseMessageChunk with a BaseMessageChunk e.g. this will explode ```py from langchain_core.outputs import ( ChatGenerationChunk, ) from langchain_core.messages import BaseMessageChunk chunk1 = ChatGenerationChunk( message=BaseMessageChunk( type="customChunk", content="HI", ), ) chunk2 = ChatGenerationChunk( message=BaseMessageChunk( type="customChunk", content="HI", ), ) # this will throw new_chunk = chunk1 + chunk2 ``` In case anyone ran into this issue themselves, it's probably best to use the AIMessageChunk: a la ```py from langchain_core.outputs import ( ChatGenerationChunk, ) from langchain_core.messages import AIMessageChunk chunk1 = ChatGenerationChunk( message=AIMessageChunk( content="HI", ), ) chunk2 = ChatGenerationChunk( message=AIMessageChunk( content="HI", ), ) # No explosion! new_chunk = chunk1 + chunk2 ``` Dependencies: None! Twitter handle: `aaron_vogler` Keeping these for later if need be: ``` baskaryan efriis eyurtsev ccurme vbarda hwchase17 baskaryan efriis ``` Co-authored-by: Erick Friis <erick@langchain.dev>	2025-02-08 00:43:36 +00:00
Marlene	4fa3ef0d55	Community/Partner: Adding Azure community and partner user agent to better track usage in Python (#29561 ) - This pull request includes various changes to add a `user_agent` parameter to Azure OpenAI, Azure Search and Whisper in the Community and Partner packages. This helps in identifying the source of API requests so we can better track usage and help support the community better. I will also be adding the user_agent to the new `langchain-azure` repo as well. - No issue connected or updated dependencies. - Utilises existing tests and docs --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2025-02-07 23:28:30 +00:00
Ella Charlaix	c401254770	huggingface: Add ipex support to HuggingFaceEmbeddings (#29386 ) ONNX and OpenVINO models are available by specifying the `backend` argument (the model is loaded using `optimum` https://github.com/huggingface/optimum) ```python from langchain_huggingface import HuggingFaceEmbeddings embedding = HuggingFaceEmbeddings( model_name=model_id, model_kwargs={"backend": "onnx"}, ) ``` With this PR we also enable the IPEX backend ```python from langchain_huggingface import HuggingFaceEmbeddings embedding = HuggingFaceEmbeddings( model_name=model_id, model_kwargs={"backend": "ipex"}, ) ```	2025-02-07 15:21:09 -08:00
Bruno Alvisio	3eaf561561	core: Handle unterminated escape character when parsing partial JSON (#29065 ) Description Currently, when parsing a partial JSON, if a string ends with the escape character, the whole key/value is removed. For example: ``` >>> from langchain_core.utils.json import parse_partial_json >>> my_str = '{"foo": "bar", "baz": "qux\\' >>> >>> parse_partial_json(my_str) {'foo': 'bar'} ``` My expectation (and with this fix) would be for `parse_partial_json()` to return: ``` >>> from langchain_core.utils.json import parse_partial_json >>> >>> my_str = '{"foo": "bar", "baz": "qux\\' >>> parse_partial_json(my_str) {'foo': 'bar', 'baz': 'qux'} ``` Notes: 1. It could be argued that current behavior is still desired. 2. I have experienced this issue when the streaming output from an LLM and the chunk happens to end with `\\` 3. I haven't included tests. Will do if change is accepted. 4. This is specially troublesome when this function is used by `187131c55c/libs/core/langchain_core/output_parsers/transform.py (L111)` since what happens is that, for example, if the received sequence of chunks are: `{"foo": "b` , `ar\\` : Then, the result of calling `self.parse_result()` is: ``` {"foo": "b"} ``` and the second time: ``` {} ``` Co-authored-by: Erick Friis <erick@langchain.dev>	2025-02-07 23:18:21 +00:00
Viren	252cf0af10	docs: add LangFair as a provider (#29390 ) Description: - Add `docs/docs/providers/langfair.mdx` - Register langfair in `libs/packages.yml` Twitter handle: @LangFair Tests and docs 1. Integration tests not needed as this PR only adds a .mdx file to docs. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com> Co-authored-by: Dylan Bouchard <dylan.bouchard@cvshealth.com> Co-authored-by: Dylan Bouchard <109233938+dylanbouchard@users.noreply.github.com> Co-authored-by: Erick Friis <erickfriis@gmail.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2025-02-07 21:27:37 +00:00
Erick Friis	eb9eddae0c	docs: use init_chat_model (#29623 )	2025-02-07 12:39:27 -08:00
ccurme	bff25b552c	community: release 0.3.17 (#29676 )	2025-02-07 19:41:44 +00:00
ccurme	01314c51fa	langchain: release 0.3.18 (#29654 )	2025-02-07 13:40:26 -05:00
ccurme	92e2239414	openai[patch]: make parallel_tool_calls explicit kwarg of bind_tools (#29669 ) Improves discoverability and documentation. cc @vbarda	2025-02-07 13:34:32 -05:00
Marc Ammann	5690575f13	openai: Removed tool_calls from completion chunk after other chunks have already been sent. (#29649 ) - Description: Before sending a completion chunk at the end of an OpenAI stream, removing the tool_calls as those have already been sent as chunks. - Issue: - - Dependencies: - - Twitter handle: - @ccurme as mentioned in another PR --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-02-07 10:15:52 -05:00
Ikko Eltociear Ashimine	0d45ad57c1	community: update base_o365.py (#29657 ) extention -> extension	2025-02-07 08:43:29 -05:00
Vincent Emonet	3645181d0e	qdrant: Add `similarity_search_with_score_by_vector()` function to the `QdrantVectorStore` (#29641 ) Added `similarity_search_with_score_by_vector()` function to the `QdrantVectorStore` class. It is required when we want to query multiple time with the same embeddings. It was present in the now deprecated original `Qdrant` vectorstore implementation, but was absent from the new one. It is also implemented in a number of others `VectorStore` implementations I have added tests for this new function Note that I also argued in this discussion that it should be part of the general `VectorStore` https://github.com/langchain-ai/langchain/discussions/29638 Co-authored-by: Erick Friis <erick@langchain.dev>	2025-02-07 00:55:58 +00:00
ccurme	488cb4a739	anthropic: release 0.3.7 (#29653 )	2025-02-06 17:05:57 -05:00
ccurme	ab09490c20	openai: release 0.3.4 (#29652 )	2025-02-06 17:02:21 -05:00
ccurme	29a0c38cc3	openai[patch]: add test for message.name (#29651 )	2025-02-06 16:49:28 -05:00
ccurme	91cca827c0	tests: release 0.3.11 (#29648 )	2025-02-06 21:48:09 +00:00
Sunish Sheth	25ce1e211a	docs: Updating the imports for langchain-databricks to databricks-langchain (#29646 ) Thank you for contributing to LangChain! - [x] PR title: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.	2025-02-06 13:28:07 -08:00
ccurme	e1b593ae77	text-splitters[patch]: release 0.3.6 (#29647 )	2025-02-06 16:16:05 -05:00
ccurme	a91e58bc10	core: release 0.3.34 (#29644 )	2025-02-06 15:53:56 -05:00
Vincent Emonet	08b9eaaa6f	community: improve FastEmbedEmbeddings support for ONNX execution provider (e.g. GPU) (#29645 ) I made a change to how was implemented the support for GPU in `FastEmbedEmbeddings` to be more consistent with the existing implementation `langchain-qdrant` sparse embeddings implementation It is directly enabling to provide the list of ONNX execution providers: https://github.com/langchain-ai/langchain/blob/master/libs/partners/qdrant/langchain_qdrant/fastembed_sparse.py#L15 It is a bit less clear to a user that just wants to enable GPU, but gives more capabilities to work with other execution providers that are not the `CUDAExecutionProvider`, and is more future proof Sorry for the disturbance @ccurme > Nice to see you just moved to `uv`! It is so much nicer to run format/lint/test! No need to manually rerun the `poetry install` with all required extras now	2025-02-06 15:31:23 -05:00
ccurme	3450bfc806	infra: add UV_FROZEN to makefiles (#29642 ) These are set in Github workflows, but forgot to add them to most makefiles for convenience when developing locally. `uv run` will automatically sync the lock file. Because many of our development dependencies are local installs, it will pick up version changes and update the lock file. Passing `--frozen` or setting this environment variable disables the behavior.	2025-02-06 14:36:54 -05:00
ccurme	d172984c91	infra: migrate to uv (#29566 )	2025-02-06 13:36:26 -05:00
ccurme	9da06e6e94	standard-tests[patch]: use `has_structured_output` property to engage structured output tests (#29635 ) Motivation: dedicated structured output features are becoming more common, such that integrations can support structured output without supporting tool calling. Here we make two changes: 1. Update the `has_structured_output` method to default to True if a model supports tool calling (in addition to defaulting to True if `with_structured_output` is overridden). 2. Update structured output tests to engage if `has_structured_output` is True.	2025-02-06 10:09:06 -08:00
Vincent Emonet	db8201d4da	community: fix typo in the module imported when using GPU with FastEmbedEmbeddings (#29631 ) Made a mistake in the module to import (the module stay the same only the installed package changes), fixed it and tested it https://github.com/langchain-ai/langchain/pull/29627 @ccurme	2025-02-06 10:26:08 -05:00
Mohammed Abbadi	f8fd65dea2	community: Update deeplake.py (#29633 ) Deep Lake recently released version 4, which introduces significant architectural changes, including a new on-disk storage format, enhanced indexing mechanisms, and improved concurrency. However, LangChain's vector store integration currently does not support Deep Lake v4 due to breaking API changes. Previously, the installation command was: `pip install deeplake[enterprise]` This installs the latest available version, which now defaults to Deep Lake v4. Since LangChain's vector store integration is still dependent on v3, this can lead to compatibility issues when using Deep Lake as a vector database within LangChain. To ensure compatibility, the installation command has been updated to: `pip install deeplake[enterprise]<4.0.0` This constraint ensures that pip installs the latest available version of Deep Lake within the v3 series while avoiding the incompatible v4 update.	2025-02-06 10:25:13 -05:00
Vincent Emonet	0ac5536f04	community: add support for using GPUs with FastEmbedEmbeddings (#29627 ) - Description: add a `gpu: bool = False` field to the `FastEmbedEmbeddings` class which enables to use GPU (through ONNX CUDA provider) when generating embeddings with any fastembed model. It just requires the user to install a different dependency and we use a different provider when instantiating `fastembed.TextEmbedding` - Issue: when generating embeddings for a really large amount of documents this drastically increase performance (honestly that is a must have in some situations, you can't just use CPU it is way too slow) - Dependencies: no direct change to dependencies, but internally the users will need to install `fastembed-gpu` instead of `fastembed`, I made all the changes to the init function to properly let the user know which dependency they should install depending on if they enabled `gpu` or not cf. fastembed docs about GPU for more details: https://qdrant.github.io/fastembed/examples/FastEmbed_GPU/ I did not added test because it would require access to a GPU in the testing environment	2025-02-06 08:04:19 -05:00
Dmitrii Rashchenko	0ceda557aa	add o1 and o3-mini to pricing (#29628 ) ### PR Title: community: add latest OpenAI models pricing ### Description: This PR updates the OpenAI model cost calculation mapping by adding the latest OpenAI models, o1 (non-preview) and o3-mini, based on the pricing listed on the [OpenAI pricing page](https://platform.openai.com/docs/pricing). ### Changes: - Added pricing for `o1`, `o1-2024-12-17`, `o1-cached`, and `o1-2024-12-17-cached` for input tokens. - Added pricing for `o1-completion` and `o1-2024-12-17-completion` for output tokens. - Added pricing for `o3-mini`, `o3-mini-2025-01-31`, `o3-mini-cached`, and `o3-mini-2025-01-31-cached` for input tokens. - Added pricing for `o3-mini-completion` and `o3-mini-2025-01-31-completion` for output tokens. ### Issue: N/A ### Dependencies: None ### Testing & Validation: - No functional changes outside of updating the cost mapping. - No tests were added or modified.	2025-02-06 08:02:20 -05:00
ZhangShenao	ac53977dbc	[MistralAI] Improve MistralAIEmbeddings (#29242 ) - Add static method decorator for method. - Add expected exception for retry decorator #29125	2025-02-05 21:31:54 -05:00
Andrew Wason	22aa5e07ed	standard-tests: Fix ToolsIntegrationTests to correctly handle "content_and_artifact" tools (#29391 ) Description: The response from `tool.invoke()` is always a ToolMessage, with content and artifact fields, not a tuple. The tuple is converted to a ToolMessage here `b6ae7ca91d/libs/core/langchain_core/tools/base.py (L726)` Issue: Currently `ToolsIntegrationTests` requires `invoke()` to return a tuple and so standard tests fail for "content_and_artifact" tools. This fixes that to check the returned ToolMessage. This PR also adds a test that now passes.	2025-02-05 21:27:09 -05:00
Mohammad Anash	f849305a56	fixed Bug in PreFilter of AzureCosmosDBNoSqlVectorSearch (#29613 ) Description: Fixes PreFilter value handling in Azure Cosmos DB NoSQL vectorstore. The current implementation fails to handle numeric values in filter conditions, causing an undefined value variable error. This PR adds support for numeric, boolean, and NULL values while maintaining the existing string and list handling. Changes: Added handling for numeric types (int/float) Added boolean value support Added NULL value handling Added type validation for unsupported values Fixed scope of value variable initialization Issue: Fixes #29610 Implementation Notes: No changes to public API Backwards compatible Maintains consistent behavior with existing MongoDB-style filtering Preserves SQL injection prevention through proper value handling --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-02-06 02:20:26 +00:00
Philippe PRADOS	6ff0d5c807	community[minor]: 04 - Refactoring PDFMiner parser (#29526 ) This is one part of a larger Pull Request (PR) that is too large to be submitted all at once. This specific part focuses on updating the XXX parser. For more details, see [PR 28970](https://github.com/langchain-ai/langchain/pull/28970). --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2025-02-05 21:08:27 -05:00
Isaac Francisco	91ffd7caad	core: allow passing message dicts into ChatPromptTemplate (#29363 ) Co-authored-by: Erick Friis <erick@langchain.dev>	2025-02-05 09:45:52 -08:00
ccurme	69595b0914	docs: fix builds (#29607 ) Failing with: > ValueError: Provider page not found for databricks-langchain. Please add one at docs/integrations/providers/databricks-langchain.{mdx,ipynb}	2025-02-05 14:24:53 +00:00
ccurme	91a33a9211	anthropic[patch]: release 0.3.6 (#29606 )	2025-02-05 14:18:02 +00:00
ccurme	5cbe6aba8f	anthropic[patch]: support citations in streaming (#29591 )	2025-02-05 09:12:07 -05:00
William FH	5ae4ed791d	Drop duplicate inputs (#29589 )	2025-02-04 18:06:10 -08:00
Erick Friis	65f0deb81a	packages: databricks-langchain (#29593 )	2025-02-05 01:53:34 +00:00
Yoav Levy	621bba7e26	docs: add nimble as a provider (#29579 ) ## Description: - Add docs/docs/providers/nimbleway.ipynb - Add docs/docs/integrations/retrievers/nimbleway.ipynb - Register nimbleway in libs/packages.yml - X (twitter) handle: @urielkn / @LevyNorbit8	2025-02-04 16:47:03 -08:00
Erick Friis	50d61eafa2	partners/deepseek: release 0.1.1 (#29592 )	2025-02-04 23:46:38 +00:00
Erick Friis	7edfcbb090	docs: rename to langchain-deepseek in docs (#29587 )	2025-02-04 14:22:17 -08:00
Erick Friis	df8fa882b2	deepseek: bump core (#29584 )	2025-02-04 10:25:46 -08:00
Erick Friis	455f65947a	deepseek: rename to langchain-deepseek from langchain-deepseek-official (#29583 )	2025-02-04 17:57:25 +00:00
Philippe PRADOS	5771e561fb	[Bugfix langchain_community] Fix PyMuPDFLoader (#29550 ) - Description: add legacy properties - Issue: #29470 - Twitter handle: pprados	2025-02-04 09:24:40 -05:00
Ashutosh Kumar	65b404a2d1	[oci_generative_ai] Option to pass auth_file_location (#29481 ) PR title: "community: Option to pass auth_file_location for oci_generative_ai" Description: Option to pass auth_file_location, to overwrite config file default location "~/.oci/config" where profile name configs present. This is not fixing any issues. Just added optional parameter called "auth_file_location", which internally supported by any OCI client including GenerativeAiInferenceClient.	2025-02-03 21:44:13 -05:00
Teruaki Ishizaki	aeb42dc900	partners: Fixed the procedure of initializing pad_token_id (#29500 ) - Description: Add to check pad_token_id and eos_token_id of model config. It seems that this is the same bug as the HuggingFace TGI bug. It's same bug as #29434 - Issue: #29431 - Dependencies: none - Twitter handle: tell14 Example code is followings: ```python from langchain_huggingface.llms import HuggingFacePipeline hf = HuggingFacePipeline.from_model_id( model_id="meta-llama/Llama-3.2-3B-Instruct", task="text-generation", pipeline_kwargs={"max_new_tokens": 10}, ) from langchain_core.prompts import PromptTemplate template = """Question: {question} Answer: Let's think step by step.""" prompt = PromptTemplate.from_template(template) chain = prompt \| hf question = "What is electroencephalography?" print(chain.invoke({"question": question})) ```	2025-02-03 21:40:33 -05:00
AmirPoursaberi	a6efd22ba1	Fix a tiny typo in `create_retrieval_chain` docstring (#29552 ) Hi there! To fix a tiny typo in `create_retrieval_chain` docstring.	2025-02-03 10:54:49 -05:00
Hemant Rawat	db1693aa70	community: fix issue #29429 in age_graph.py (#29506 ) ## Description: This PR addresses issue #29429 by fixing the _wrap_query method in langchain_community/graphs/age_graph.py. The method now correctly handles Cypher queries with UNION and EXCEPT operators, ensuring that the fields in the SQL query are ordered as they appear in the Cypher query. Additionally, the method now properly handles cases where RETURN * is not supported. ### Issue: #29429 ### Dependencies: None ### Add tests and docs: Added unit tests in tests/unit_tests/graphs/test_age_graph.py to validate the changes. No new integrations were added, so no example notebook is necessary. Lint and test: Ran make format, make lint, and make test to ensure code quality and functionality.	2025-02-01 21:24:45 -05:00
Keenan Pepper	2f97916dea	docs: Add goodfire notebook and add to packages.yml (#29512 ) - Description: Add Goodfire ipynb notebook and add langchain-goodfire package to packages.yml - Issue: n/a - Dependencies: docs only - Twitter handle: keenanpepper --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-02-01 19:43:20 -05:00
ccurme	a3c5e4d070	deepseek[patch]: bump langchain-openai and add to scheduled testing (#29535 )	2025-02-01 18:40:59 -05:00
ccurme	16a422f3fa	community: add standard tests for Perplexity (#29534 )	2025-02-01 17:02:57 -05:00
Amit Ghadge	0c405245c4	[Integrations][Tool] Added Jenkins tools support (#29516 ) Thank you for contributing to LangChain! - [x] PR title: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [x] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-01-31 12:50:10 -05:00
Christophe Bornet	aab2e42169	core[patch]: Use Blockbuster to detect blocking calls in asyncio during tests (#29043 ) This PR uses the [blockbuster](https://github.com/cbornet/blockbuster) library in langchain-core to detect blocking calls made in the asyncio event loop during unit tests. Avoiding blocking calls is hard as these can be deeply buried in the code or made in 3rd party libraries. Blockbuster makes it easier to detect them by raising an exception when a call is made to a known blocking function (eg: `time.sleep`). Adding blockbuster allowed to find a blocking call in `aconfig_with_context` (it ends up calling `get_function_nonlocals` which loads function code). Dependencies: - blockbuster (test) Twitter handle: cbornet_	2025-01-31 10:06:34 -05:00
Philippe PRADOS	ceda8bc050	community[minor]: 03 - Refactoring PyPDF parser (#29330 ) This is one part of a larger Pull Request (PR) that is too large to be submitted all at once. This specific part focuses on updating the PyPDF parser. For more details, see [PR 28970](https://github.com/langchain-ai/langchain/pull/28970).	2025-01-31 10:05:07 -05:00
Julian Castro Pulgarin	b7e3e337b1	community: Fix YahooFinanceNewsTool to handle updated yfinance data structure (#29498 ) Description:* Updates the YahooFinanceNewsTool to handle the current yfinance news data structure. The tool was failing with a KeyError due to changes in the yfinance API's response format. This PR updates the code to correctly extract news URLs from the new structure. Issue: #29495 Dependencies: No new dependencies required. Works with existing yfinance package. The changes maintain backwards compatibility while fixing the KeyError that users were experiencing. The modified code properly handles the new data structure where: - News type is now at `content.contentType` - News URL is now at `content.canonicalUrl.url` --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-01-31 02:31:44 +00:00
Erick Friis	332e303858	partners/mistralai: release 0.2.6 (#29491 )	2025-01-29 22:23:14 +00:00
Erick Friis	2c795f5628	partners/openai: release 0.3.3 (#29490 )	2025-01-29 22:23:03 +00:00
Erick Friis	f307b3cc5f	langchain: release 0.3.17 (#29485 )	2025-01-29 22:22:49 +00:00
Erick Friis	5cad3683b4	partners/groq: release 0.2.4 (#29488 )	2025-01-29 22:22:30 +00:00
Erick Friis	e074c26a6b	partners/fireworks: release 0.2.7 (#29487 )	2025-01-29 22:22:18 +00:00
Erick Friis	685609e1ef	partners/anthropic: release 0.3.5 (#29486 )	2025-01-29 22:22:11 +00:00
Erick Friis	ed3a5e664c	standard-tests: release 0.3.10 (#29484 )	2025-01-29 22:21:05 +00:00
Erick Friis	29461b36d9	partners/ollama: release 0.2.3 (#29489 )	2025-01-29 22:19:44 +00:00
Erick Friis	07e2e80fe7	core: release 0.3.33 (#29483 )	2025-01-29 14:11:53 -08:00
Erick Friis	8f95da4eb1	multiple: structured output tracing standard metadata (#29421 ) Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-01-29 14:00:26 -08:00
ccurme	284c935b08	tests[patch]: improve coverage of structured output tests (#29478 )	2025-01-29 14:52:09 -05:00
Matheus Torquato	7aae738296	docs:Fix Imports for Document and BaseRetriever (#29473 ) This pull request addresses an issue with import statements in the langchain_core/retrievers.py file. The following changes have been made: Corrected the import for Document from langchain_core.documents.base. Corrected the import for BaseRetriever from langchain_core.retrievers. These changes ensure that the SimpleRetriever class can correctly reference the Document and BaseRetriever classes, improving code reliability and maintainability. --------- Co-authored-by: Matheus Torquato <mtorquat@jaguarlandrover.com> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-01-29 14:32:05 +00:00
Mohammad Anash	12bcc85927	added operator filter for supabase (#29475 ) Description This PR adds support for MongoDB-style $in operator filtering in the Supabase vectorstore implementation. Currently, filtering with $in operators returns no results, even when matching documents exist. This change properly translates MongoDB-style filters to PostgreSQL syntax, enabling efficient multi-document filtering. Changes Modified similarity_search_by_vector_with_relevance_scores to handle MongoDB-style $in operators Added automatic conversion of $in filters to PostgreSQL IN clauses Preserved original vector type handling and numpy array conversion Maintained compatibility with existing postgrest filters Added support for the same filtering in similarity_search_by_vector_returning_embeddings Issue Closes #27932 Implementation Notes No changes to public API or function signatures Backwards compatible - behavior unchanged for non-$in filters More efficient than multiple individual queries for multi-ID searches Preserves all existing functionality including numpy array conversion for vector types Dependencies None Additional Notes The implementation handles proper SQL escaping for filter values Maintains consistent behavior with other vectorstore implementations that support MongoDB-style operators Future extensions could support additional MongoDB-style operators ($gt, $lt, etc.) --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-01-29 14:24:18 +00:00
ccurme	585f467d4a	mistral[patch]: release 0.2.5 (#29463 )	2025-01-28 18:29:54 -05:00
ccurme	ca9d4e4595	mistralai: support method="json_schema" in structured output (#29461 ) https://docs.mistral.ai/capabilities/structured-output/custom_structured_output/	2025-01-28 18:17:39 -05:00
Michael Chin	e120378695	community: Additional AWS deprecations (#29447 ) Added deprecation warnings for a few more classes that weremoved to `langchain-aws` package: - [SageMaker Endpoint LLM](https://python.langchain.com/api_reference/aws/retrievers/langchain_aws.retrievers.bedrock.AmazonKnowledgeBasesRetriever.html) - [Amazon Kendra retriever](https://python.langchain.com/api_reference/aws/retrievers/langchain_aws.retrievers.kendra.AmazonKendraRetriever.html) - [Amazon Bedrock Knowledge Bases retriever](https://python.langchain.com/api_reference/aws/retrievers/langchain_aws.retrievers.bedrock.AmazonKnowledgeBasesRetriever.html)	2025-01-28 09:50:14 -05:00
Erick Friis	2d776351af	community: release 0.3.16 (#29452 )	2025-01-28 07:44:54 +00:00
Erick Friis	737a68fcdc	langchain: release 0.3.16 (#29451 )	2025-01-28 07:31:09 +00:00
Erick Friis	8bf9c71673	core: release 0.3.32 (#29450 )	2025-01-28 07:20:04 +00:00
Erick Friis	ecdc881328	langchain: add deepseek provider to init chat model (#29449 )	2025-01-27 23:13:59 -08:00
Erick Friis	dced0ed3fd	deepseek, docs: chatdeepseek integration added (#29445 )	2025-01-28 06:32:58 +00:00
Isaac Francisco	2bb2c9bfe8	change behavior for converting a string to openai messages (#29446 )	2025-01-27 18:18:54 -08:00
ccurme	b1fdac726b	groq[patch]: update model used in test (#29441 ) `llama-3.1-70b-versatile` was [shut down](https://console.groq.com/docs/deprecations).	2025-01-27 21:11:44 +00:00
Adrián Panella	1551d9750c	community(doc_loaders): allow any credential type in AzureAIDocumentI… (#29289 ) allow any credential type in AzureAIDocumentInteligence, not only `api_key`. This allows to use any of the credentials types integrated with AD. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-01-27 20:56:30 +00:00
ccurme	f00c66cc1f	chroma[patch]: release 0.2.1 (#29440 )	2025-01-27 20:41:35 +00:00
Jorge Piedrahita Ortiz	3b886cdbb2	libs: add sambanova-lagchain integration package (#29417 ) - Description:: Add sambanova-langchain integration package as suggested in previous PRs --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-01-27 20:34:55 +00:00
Mohammad Anash	aba1fd0bd4	fixed similarity search with score error #29407 (#29413 ) Description: Fix TypeError in AzureSearch similarity_search_with_score by removing search_type from kwargs before passing to underlying requests. This resolves issue #29407 where search_type was being incorrectly passed through to Session.request(). Issue: #29407 --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-01-27 20:34:42 +00:00
itaismith	7b404fcd37	partners[chroma]: Upgrade Chroma to 0.6.x (#29404 ) Thank you for contributing to LangChain! - [x] PR title: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [x] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.	2025-01-27 15:32:21 -05:00
Teruaki Ishizaki	3fce78994e	community: Fixed the procedure of initializing pad_token_id (#29434 ) - Description: Add to check pad_token_id and eos_token_id of model config. It seems that this is the same bug as the HuggingFace TGI bug. In addition, the source code of libs/partners/huggingface/langchain_huggingface/llms/huggingface_pipeline.py also requires similar changes. - Issue: #29431 - Dependencies: none - Twitter handle: tell14	2025-01-27 14:54:54 -05:00
Christophe Bornet	dbb6b7b103	core: Add ruff rules TRY (tryceratops) (#29388 ) TRY004 ("use TypeError rather than ValueError") existing errors are marked as ignore to preserve backward compatibility. LMK if you prefer to fix some of them. Co-authored-by: Erick Friis <erick@langchain.dev>	2025-01-24 05:01:40 +00:00
Erick Friis	723b603f52	docs: groq api key links (#29402 )	2025-01-24 04:33:18 +00:00
ccurme	bbc50f65e7	anthropic[patch]: release 0.3.4 (#29399 )	2025-01-23 23:55:58 +00:00
ccurme	ed797e17fb	anthropic[patch]: always return content blocks if citations are generated (#29398 ) We currently return string (and therefore no content blocks / citations) if the response is of the form ``` [ {"text": "a claim", "citations": [...]}, ] ``` There are other cases where we do return citations as-is: ``` [ {"text": "a claim", "citations": [...]}, {"text": "some other text"}, {"text": "another claim", "citations": [...]}, ] ``` Here we update to return content blocks including citations in the first case as well.	2025-01-23 18:47:23 -05:00
Bagatur	317fb86fd9	openai[patch]: fix int test (#29395 )	2025-01-23 21:23:01 +00:00
Bagatur	8d566a8fe7	openai[patch]: detect old models in with_structured_output (#29392 ) Co-authored-by: ccurme <chester.curme@gmail.com>	2025-01-23 20:47:32 +00:00
Christophe Bornet	b6ae7ca91d	core: Cache RunnableLambda __repr__ (#29199 ) `RunnableLambda`'s `__repr__` may do costly OS operation by calling `get_lambda_source`. So it's better to cache it. See #29043 --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-01-23 18:34:47 +00:00
Christophe Bornet	618e550f06	core: Cache RunnableLambda deps (#29200 ) `RunnableLambda`'s `deps` may do costly OS operation by calling `get_function_nonlocals`. So it's better to cache it. See #29043	2025-01-23 13:09:07 -05:00
ccurme	f795ab99ec	docs: fix title rendered for integration package (#29387 ) "Tilores LangchAIn" -> "Tilores"	2025-01-23 12:21:19 -05:00
Stefan Berkner	8977451c76	docs: add Tilores provider and tools (#29244 ) Description: This PR adds documentation for the Tilores provider and tools. Issue: closes #26320	2025-01-23 12:17:59 -05:00
Ahmed Tammaa	d5b8aabb32	text-splitters[patch]: delete unused html_chunks_with_headers.xslt (#29340 ) This pull request removes the now-unused html_chunks_with_headers.xslt file from the codebase. In a previous update ([PR #27678](https://github.com/langchain-ai/langchain/pull/27678)), the HTMLHeaderTextSplitter class was refactored to utilize BeautifulSoup instead of lxml and XSLT for HTML processing. As a result, the html_chunks_with_headers.xslt file is no longer necessary and can be safely deleted to maintain code cleanliness and reduce potential confusion. Issue: N/A Dependencies: N/A	2025-01-23 11:29:08 -05:00
Wang Ran (汪然)	8f2c11e17b	core[patch]: fix API reference for draw_ascii (#29370 ) typo: no `draw` but `draw_ascii` and other things now, it works: <img width="688" alt="image" src="https://github.com/user-attachments/assets/5b5a8cc2-cf81-4a5c-b443-da0e4426556c" /> --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-01-23 16:04:58 +00:00
Loris Alexandre	e4921239a6	community: missing mandatory parameter partition_key for AzureCosmosDBNoSqlVectorSearch (#29382 ) - Description: the `delete` function of AzureCosmosDBNoSqlVectorSearch is using `self._container.delete_item(document_id)` which miss a mandatory parameter `partition_key` We use the class function `delete_document_by_id` to provide a default `partition_key` - Issue: #29372 - Dependencies: None - Twitter handle: None Co-authored-by: Loris Alexandre <loris.alexandre@boursorama.fr>	2025-01-23 10:05:10 -05:00
Terry Tan	ec0ebb76f2	community: fix Google Scholar tool errors (#29371 ) Resolve https://github.com/langchain-ai/langchain/issues/27557	2025-01-23 10:03:01 -05:00
江同学呀	a1e62070d0	community: Fix the problem of error reporting when OCR extracts text from PDF. (#29378 ) - Description: The issue has been fixed where images could not be recognized from ```xObject[obj]["/Filter"]``` (whose value can be either a string or a list of strings) in the ```_extract_images_from_page()``` method. It also resolves the bug where vectorization by Faiss fails due to the failure of image extraction from a PDF containing only images```IndexError: list index out of range```. ![69a60f3f6bd474641b9126d74bb18f7e](https://github.com/user-attachments/assets/dc9e098d-2862-49f7-93b0-00f1056727dc) - Issue: Fix the following issues: [#15227 ](https://github.com/langchain-ai/langchain/issues/15227) [#22892 ](https://github.com/langchain-ai/langchain/issues/22892) [#26652 ](https://github.com/langchain-ai/langchain/issues/26652) [#27153 ](https://github.com/langchain-ai/langchain/issues/27153) Related issues: [#7067 ](https://github.com/langchain-ai/langchain/issues/7067) - Dependencies: None - Twitter handle: None --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-01-23 15:01:52 +00:00
Tim Mallezie	a13faab6b7	community; allow to set gitlab url in gitlab tool in constrictor (#29380 ) This pr, expands the gitlab url so it can also be set in a constructor, instead of only through env variables. This allows to do something like this. ``` # Create the GitLab API wrapper gitlab_api = GitLabAPIWrapper( gitlab_url=self.gitlab_url, gitlab_personal_access_token=self.gitlab_personal_access_token, gitlab_repository=self.gitlab_repository, gitlab_branch=self.gitlab_branch, gitlab_base_branch=self.gitlab_base_branch, ) ``` Where before you could not set the url in the constructor. Co-authored-by: Tim Mallezie <tim.mallezie@dropsolid.com>	2025-01-23 09:36:27 -05:00
Tyllen	f2ea62f632	docs: add payman docs (#29362 ) - Description: Adding the docs to use the payman-langchain integration :)	2025-01-22 18:37:47 -08:00
Erick Friis	3f1d20964a	standard-tests: release 0.3.9 (#29356 )	2025-01-22 09:46:19 -08:00
Macs Dickinson	7378c955db	community: adds support for getting github releases for the configured repository (#29318 ) Description: adds support for github tool to query github releases on the configure respository Issue: N/A Dependencies: N/A Twitter handle: @macsdickinson --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-01-22 15:45:52 +00:00
Tayaa Med Amine	ef1610e24a	langchain[patch]: support ollama in init_embeddings (#29349 ) Why not Ollama ? Thank you for contributing to LangChain! - [ ] PR title: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-01-22 14:47:12 +00:00
Siddhant	9eb10a9240	langchain: added vectorstore docstring linting (#29241 ) …ore.py Thank you for contributing to LangChain! - [x] PR title: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" Added docstring linting in the vectorstore.py file relating to issue #25154 - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Siddhant Jain <sjain35@buffalo.edu> Co-authored-by: Erick Friis <erick@langchain.dev>	2025-01-22 03:47:43 +00:00
Sohan	de1fc4811d	packages, docs: Pipeshift - Langchain integration of pipeshift (#29114 ) Description: Added pipeshift integration. This integrates pipeshift LLM and ChatModels APIs with langchain Dependencies: none Unit Tests & Integration tests are added Documentation is added as well This PR is w.r.t [#27390](https://github.com/langchain-ai/langchain/pull/27390) and as per request, a freshly minted `langchain-pipeshift` package is uploaded to PYPI. Only changes to the docs & packages.yml are made in langchain master branch --------- Co-authored-by: Chester Curme <chester.curme@gmail.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2025-01-22 03:03:06 +00:00
Christophe Bornet	836c791829	text-splitters: Bump ruff version to 0.9 (#29231 ) Co-authored-by: Erick Friis <erick@langchain.dev>	2025-01-22 00:27:58 +00:00
Christophe Bornet	a004dec119	langchain: Bump ruff version to 0.9 (#29211 ) Co-authored-by: Erick Friis <erick@langchain.dev>	2025-01-22 00:26:39 +00:00
Christophe Bornet	2340b3154d	standard-tests: Bump ruff version to 0.9 (#29230 ) Co-authored-by: Erick Friis <erick@langchain.dev>	2025-01-22 00:23:01 +00:00
Christophe Bornet	e4a78dfc2a	core: Bump ruff version to 0.9 (#29201 ) Also run some preview autofix and formatting --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2025-01-22 00:20:09 +00:00
Ella Charlaix	6f95db81b7	huggingface: Add IPEX models support (#29179 ) Co-authored-by: Erick Friis <erick@langchain.dev>	2025-01-22 00:16:44 +00:00
Bhav Sardana	d6a7aaa97d	community: Fix for Pydantic model validator of GoogleApiClient (#29346 ) - [ ] PR message: Delete this entire checklist* and replace with - Description: Fix for pedantic model validator for GoogleApiHandler - Issue: the issue #29165 - [ ] Lint and test*: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. --------- Signed-off-by: Bhav Sardana <sardana.bhav@gmail.com>	2025-01-21 15:17:43 -05:00
Christophe Bornet	1c4ce7b42b	core: Auto-fix some docstrings (#29337 )	2025-01-21 13:29:53 -05:00
ccurme	86a0720310	fireworks[patch]: update model used in integration tests (#29342 ) No access to firefunction-v1 and -v2.	2025-01-21 11:05:30 -05:00
Hugo Berg	32c9c58adf	Community: fix missing f-string modifier in oai structured output parsing error (#29326 ) - Description: The ValueError raised on certain structured-outputs parsing errors, in langchain openai community integration, was missing a f-string modifier and so didn't produce useful outputs. This is a 2-line, 2-character change. - Issue: None open that this fixes - Dependencies: Nothing changed - Twitter handle: None - [X] Add tests and docs: There's nothing to add for. - [-] Lint and test: Happy to run this if you deem it necessary. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-01-21 14:26:38 +00:00
Nuno Campos	566915d7cf	core: fix call to get closure vars for partial-wrapped funcs (#29316 ) Thank you for contributing to LangChain! - [ ] PR title: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.	2025-01-21 09:26:15 -05:00
ZhangShenao	33e22ccb19	[Doc] Improve api doc (#29324 ) - Fix doc description - Add static method decorator	2025-01-21 09:16:08 -05:00
Bagatur	536b44a47f	community[patch]: Release 0.3.15 (#29325 )	2025-01-21 03:10:07 +00:00
Bagatur	ec5fae76d4	langchain[patch]: Release 0.3.15 (#29322 )	2025-01-21 02:24:11 +00:00
Bagatur	923e6fb321	core[patch]: 0.3.31 (#29320 )	2025-01-21 01:17:31 +00:00
Ahmed Tammaa	d3ed9b86be	text-splitters[minor]: Replace lxml and XSLT with BeautifulSoup in HTMLHeaderTextSplitter for Improved Large HTML File Processing (#27678 ) This pull request updates the `HTMLHeaderTextSplitter` by replacing the `split_text_from_file` method's implementation. The original method used `lxml` and XSLT for processing HTML files, which caused `lxml.etree.xsltapplyerror maxhead` when handling large HTML documents due to limitations in the XSLT processor. Fixes #13149 By switching to BeautifulSoup (`bs4`), we achieve: - Improved Performance and Reliability: BeautifulSoup efficiently processes large HTML files without the errors associated with `lxml` and XSLT. - Simplified Dependencies: Removes the dependency on `lxml` and external XSLT files, relying instead on the widely used `beautifulsoup4` library. - Maintained Functionality: The new method replicates the original behavior, ensuring compatibility with existing code and preserving the extraction of content and metadata. Issue: This change addresses issues related to processing large HTML files with the existing `HTMLHeaderTextSplitter` implementation. It resolves problems where users encounter lxml.etree.xsltapplyerror maxhead due to large HTML documents. Dependencies: - BeautifulSoup (`beautifulsoup4`): The `beautifulsoup4` library is now used for parsing HTML content. - Installation: `pip install beautifulsoup4` Code Changes: Updated the `split_text_from_file` method in `HTMLHeaderTextSplitter` as follows: ```python def split_text_from_file(self, file: Any) -> List[Document]: """Split HTML file using BeautifulSoup. Args: file: HTML file path or file-like object. Returns: List of Document objects with page_content and metadata. """ from bs4 import BeautifulSoup from langchain.docstore.document import Document import bs4 # Read the HTML content from the file or file-like object if isinstance(file, str): with open(file, 'r', encoding='utf-8') as f: html_content = f.read() else: # Assuming file is a file-like object html_content = file.read() # Parse the HTML content using BeautifulSoup soup = BeautifulSoup(html_content, 'html.parser') # Extract the header tags and their corresponding metadata keys headers_to_split_on = [tag[0] for tag in self.headers_to_split_on] header_mapping = dict(self.headers_to_split_on) documents = [] # Find the body of the document body = soup.body if soup.body else soup # Find all header tags in the order they appear all_headers = body.find_all(headers_to_split_on) # If there's content before the first header, collect it first_header = all_headers[0] if all_headers else None if first_header: pre_header_content = '' for elem in first_header.find_all_previous(): if isinstance(elem, bs4.Tag): text = elem.get_text(separator=' ', strip=True) if text: pre_header_content = text + ' ' + pre_header_content if pre_header_content.strip(): documents.append(Document( page_content=pre_header_content.strip(), metadata={} # No metadata since there's no header )) else: # If no headers are found, return the whole content full_text = body.get_text(separator=' ', strip=True) if full_text.strip(): documents.append(Document( page_content=full_text.strip(), metadata={} )) return documents # Process each header and its associated content for header in all_headers: current_metadata = {} header_name = header.name header_text = header.get_text(separator=' ', strip=True) current_metadata[header_mapping[header_name]] = header_text # Collect all sibling elements until the next header of the same or higher level content_elements = [] for sibling in header.find_next_siblings(): if sibling.name in headers_to_split_on: # Stop at the next header break if isinstance(sibling, bs4.Tag): content_elements.append(sibling) # Get the text content of the collected elements current_content = '' for elem in content_elements: text = elem.get_text(separator=' ', strip=True) if text: current_content += text + ' ' # Create a Document if there is content if current_content.strip(): documents.append(Document( page_content=current_content.strip(), metadata=current_metadata.copy() )) else: # If there's no content, but we have metadata, still create a Document documents.append(Document( page_content='', metadata=current_metadata.copy() )) return documents ``` --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2025-01-20 16:10:37 -05:00
Christophe Bornet	989eec4b7b	core: Add ruff rule S101 (no assert) (#29267 ) Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2025-01-20 20:24:31 +00:00
Christophe Bornet	e5d62c6ce7	core: Add ruff rule W293 (whitespaces) (#29272 )	2025-01-20 15:16:12 -05:00
Philippe PRADOS	4efc5093c1	community[minor]: Refactoring PyMuPDF parser, loader and add image blob parsers (#29063 ) * Adds BlobParsers for images. These implementations can take an image and produce one or more documents per image. This interface can be used for exposing OCR capabilities. * Update PyMuPDFParser and Loader to standardize metadata, handle images, improve table extraction etc. - Twitter handle: pprados This is one part of a larger Pull Request (PR) that is too large to be submitted all at once. This specific part focuses to prepare the update of all parsers. For more details, see [PR 28970](https://github.com/langchain-ai/langchain/pull/28970). --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2025-01-20 15:15:43 -05:00
Syed Baqar Abbas	f175319303	[feat] Added backwards compatibility for OllamaEmbeddings initialization (migration from `langchain_community.embeddings` to `langchain_ollama.embeddings` (#29296 ) - [feat] Added backwards compatibility for OllamaEmbeddings initialization (migration from `langchain_community.embeddings` to `langchain_ollama.embeddings`: "langchain_ollama" - Description: Given that `OllamaEmbeddings` from `langchain_community.embeddings` is deprecated, code is being shifted to ``langchain_ollama.embeddings`. However, this does not offer backward compatibility of initializing the parameters and `OllamaEmbeddings` object. - Issue: #29294 - Dependencies: None - Twitter handle: @BaqarAbbas2001 ## Additional Information Previously, `OllamaEmbeddings` from `langchain_community.embeddings` used to support the following options: `e9abe583b2/libs/community/langchain_community/embeddings/ollama.py (L125-L139)` However, in the new package `from langchain_ollama import OllamaEmbeddings`, there is no method to set these options. I have added these parameters to resolve this issue. This issue was also discussed in https://github.com/langchain-ai/langchain/discussions/29113	2025-01-20 11:16:29 -05:00
CLOVA Studio 개발	7a95ffc775	community: fix some features on Naver ChatModel & embedding model 2 (#29243 ) ## Description - Responding to `NCP API Key` changes. - To fix `ChatClovaX` `astream` function to raise `SSEError` when an error event occurs. - To add `token length` and `ai_filter` to ChatClovaX's `response_metadata`. - To update document for apply NCP API Key changes. cc. @efriis @vbarda	2025-01-20 11:01:03 -05:00
Sangyun_LEE	5d64597490	docs: fix broken Appearance of langchain_community/document_loaders/recursive_url_loader API Reference (#29305 ) # PR mesesage ## Description Fixed a broken Appearance of RecurisveUrlLoader API Reference. ### Before <p align="center"> <img width="750" alt="image" src="https://github.com/user-attachments/assets/f39df65d-b788-411d-88af-8bfa2607c00b" /> <img width="750" alt="image" src="https://github.com/user-attachments/assets/b8a92b70-4548-4b4a-965f-026faeebd0ec" /> </p> ### After <p align="center"> <img width="750" alt="image" src="https://github.com/user-attachments/assets/8ea28146-de45-42e2-b346-3004ec4dfc55" /> <img width="750" alt="image" src="https://github.com/user-attachments/assets/914c6966-4055-45d3-baeb-2d97eab06fe7" /> </p> ## Issue: N/A ## Dependencies None ## Twitter handle N/A # Add tests and docs Not applicable; this change only affects documentation. # Lint and test Ran make format, make lint, and make test to ensure no issues.	2025-01-20 10:56:59 -05:00
Hemant Rawat	6c52378992	Add Google-style docstring linting and update pyproject.toml (#29303 ) ### Description: This PR introduces Google-style docstring linting for the ModelLaboratory class in libs/langchain/langchain/model_laboratory.py. It also updates the pyproject.toml file to comply with the latest Ruff configuration standards (deprecating top-level lint settings in favor of lint). ### Changes include: - [x] Added detailed Google-style docstrings to all methods in ModelLaboratory. - [x] Updated pyproject.toml to move select and pydocstyle settings under the [tool.ruff.lint] section. - [x] Ensured all files pass Ruff linting. Issue: Closes #25154 ### Dependencies: No additional dependencies are required for this change. ### Checklist - [x] Files passes ruff linting. - [x] Docstrings conform to the Google-style convention. - [x] pyproject.toml updated to avoid deprecation warnings. - [x] My PR is ready to review, please review.	2025-01-19 14:37:21 -05:00
Mohammad Mohtashim	b5fbebb3c8	(Community): Changing the BaseURL and Model for MiniMax (#29299 ) - Description: Changed the Base Default Model and Base URL to correct versions. Plus added a more explicit exception if user provides an invalid API Key - Issue: #29278	2025-01-19 14:15:02 -05:00
ccurme	c20f7418c7	openai[patch]: fix Azure LLM test (#29302 ) The tokens I get are: ``` ['', '\n\n', 'The', ' sun', ' was', ' setting', ' over', ' the', ' horizon', ',', ' casting', ''] ``` so possibly an extra empty token is included in the output. lmk @efriis if we should look into this further.	2025-01-19 17:25:42 +00:00
ccurme	6b249a0dc2	openai[patch]: release 0.3.1 (#29301 )	2025-01-19 17:04:00 +00:00
ThomasSaulou	e9abe583b2	chatperplexity stream-citations in additional kwargs (#29273 ) chatperplexity stream-citations in additional kwargs --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-01-18 22:31:10 +00:00
TheSongg	1cd4d8d101	[langchain_community.llms.xinference]: Rewrite _stream() method and support stream() method in xinference.py (#29259 ) - [ ] PR title:[langchain_community.llms.xinference]: Rewrite _stream() method and support stream() method in xinference.py - [ ] PR message: Rewrite the _stream method so that the chain.stream() can be used to return data streams. chain = prompt \| llm chain.stream(input=user_input) - [ ] tests: from langchain_community.llms import Xinference from langchain.prompts import PromptTemplate llm = Xinference( server_url="http://0.0.0.0:9997", # replace your xinference server url model_uid={model_uid} # replace model_uid with the model UID return from launching the model stream = True ) prompt = PromptTemplate(input=['country'], template="Q: where can we visit in the capital of {country}? A:") chain = prompt \| llm chain.stream(input={'country': 'France'})	2025-01-17 20:31:59 -05:00
ccurme	184ea8aeb2	anthropic[patch]: update tool choice type (#29276 )	2025-01-17 15:26:33 -05:00
ccurme	ac52021097	anthropic[patch]: release 0.3.2 (#29275 )	2025-01-17 19:48:31 +00:00
ccurme	c616b445f2	anthropic[patch]: support `parallel_tool_calls` (#29257 ) Need to: - Update docs - Decide if this is an explicit kwarg of bind_tools - Decide if this should be in standard test with flag for supporting	2025-01-17 19:41:41 +00:00
ccurme	d5360b9bd6	core[patch]: release 0.3.30 (#29256 )	2025-01-16 17:52:37 -05:00
Nuno Campos	595297e2e5	core: Add support for calls in get_function_nonlocals (#29255 ) Thank you for contributing to LangChain! - [ ] PR title: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.	2025-01-16 14:43:42 -08:00
Luis Lopez	75663f2cae	community: Add cost per 1K tokens for fine-tuned model cached input (#29248 ) ### Description - Since there is no cost per 1k input tokens for a fine-tuned cached version of `gpt-4o-mini-2024-07-18` is not available when using the `OpenAICallbackHandler`, it raises an error when trying to make calls with such model. - To add the price in the `MODEL_COST_PER_1K_TOKENS` dictionary cc. @efriis	2025-01-16 15:19:26 -05:00
Junon	667d2a57fd	add mode arg to OBSFileLoader.load() method (#29246 ) - Description: add mode arg to OBSFileLoader.load() method - Issue: #29245 - Dependencies: no dependencies required for this change --------- Co-authored-by: Junon_Gz <junon_gz@qq.com>	2025-01-16 11:09:04 -05:00
Erick Friis	5eb4dc5e06	standard-tests: double messages test (#29237 )	2025-01-15 15:14:29 -08:00
Nithish Raghunandanan	1051fa5729	couchbase: Migrate couchbase partner package to different repo (#29239 ) Description: Migrate the couchbase partner package to [Couchbase-Ecosystem](https://github.com/Couchbase-Ecosystem/langchain-couchbase) org	2025-01-15 12:37:27 -08:00
Nadeem Sajjad	eaf2fb287f	community(pypdfloader): added page_label in metadata for pypdf loader (#29225 ) # Description ## Summary This PR adds support for handling multi-labeled page numbers in the PyPDFLoader. Some PDFs use complex page numbering systems where the actual content may begin after multiple introductory pages. The page_label field helps accurately reflect the document’s page structure, making it easier to handle such cases during document parsing. ## Motivation This feature improves document parsing accuracy by allowing users to access the actual page labels instead of relying only on the physical page numbers. This is particularly useful for documents where the first few pages have roman numerals or other non-standard page labels. ## Use Case This feature is especially useful for Retrieval-Augmented Generation (RAG) systems where users may reference page numbers when asking questions. Some PDFs have both labeled page numbers (like roman numerals for introductory sections) and index-based page numbers. For example, a user might ask: "What is mentioned on page 5?" The system can now check both: • Index-based page number (page) • Labeled page number (page_label) This dual-check helps improve retrieval accuracy. Additionally, the results can be validated with an agent or tool to ensure the retrieved pages match the user’s query contextually. ## Code Changes - Added a page_label field to the metadata of the Document class in PyPDFLoader. - Implemented support for retrieving page_label from the pdf_reader.page_labels. - Created a test case (test_pypdf_loader_with_multi_label_page_numbers) with a sample PDF containing multi-labeled pages (geotopo-komprimiert.pdf) [[Source of pdf](https://github.com/py-pdf/sample-files/blob/main/009-pdflatex-geotopo/GeoTopo-komprimiert.pdf)]. - Updated existing tests to ensure compatibility and verify page_label extraction. ## Tests Added - Added a new test case for a PDF with multi-labeled pages. - Verified both page and page_label metadata fields are correctly extracted. ## Screenshots <img width="549" alt="image" src="https://github.com/user-attachments/assets/65db9f5c-032e-4592-926f-824777c28f33" />	2025-01-15 14:18:07 -05:00
Mehdi	1a38948ee3	Mehdi zare/fmp data doc (#29219 ) Title: community: add Financial Modeling Prep (FMP) API integration Description: Adding LangChain integration for Financial Modeling Prep (FMP) API to enable semantic search and structured tool creation for financial data endpoints. This integration provides semantic endpoint search using vector stores and automatic tool creation with proper typing and error handling. Users can discover relevant financial endpoints using natural language queries and get properly typed LangChain tools for discovered endpoints. Issue: N/A Dependencies: fmp-data>=0.3.1 langchain-core>=0.1.0 faiss-cpu tiktoken Twitter handle: @mehdizarem Unit tests and example notebook have been added: Tests are in tests/integration_tests/est_tools.py and tests/unit_tests/test_tools.py Example notebook is in docs/tools.ipynb All format, lint and test checks pass: pytest mypy . Dependencies are imported within functions and not added to pyproject.toml. The changes are backwards compatible and only affect the community package. --------- Co-authored-by: mehdizare <mehdizare@users.noreply.github.com> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-01-15 15:31:01 +00:00
Mohammad Mohtashim	288613d361	(text-splitters): Small Fix in `_process_html` for HTMLSemanticPreservingSplitter to properly extract the metadata. (#29215 ) - Description: Include `main` in the list of elements whose child elements needs to be processed for splitting the HTML. - Issue: #29184	2025-01-15 10:18:06 -05:00
TheSongg	4867fe7ac8	[langchain_community.llms.xinference]: fix error in xinference.py (#29216 ) - [ ] PR title: [langchain_community.llms.xinference]: fix error in xinference.py - [ ] PR message: - The old code raised an ValidationError: pydantic_core._pydantic_core.ValidationError: 1 validation error for Xinference when import Xinference from xinference.py. This issue has been resolved by adjusting it's type and default value. File "/media/vdc/python/lib/python3.10/site-packages/pydantic/main.py", line 212, in __init__ validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self) pydantic_core._pydantic_core.ValidationError: 1 validation error for Xinference client Field required [type=missing, input_value={'server_url': 'http://10...t4', 'model_kwargs': {}}, input_type=dict] For further information visit https://errors.pydantic.dev/2.9/v/missing - [ ] tests: from langchain_community.llms import Xinference llm = Xinference( server_url="http://0.0.0.0:9997", # replace your xinference server url model_uid={model_uid} # replace model_uid with the model UID return from launching the model )	2025-01-15 10:11:26 -05:00
Syed Baqar Abbas	4278046329	[fix] Convert table names to list for compatibility in SQLDatabase (#29229 ) - [langchain_community.utilities.SQLDatabase] [fix] Convert table names to list for compatibility in SQLDatabase: - The issue #29227 is being fixed here - The "package" modified is community - The issue lied in this block of code: `44b41b699c/libs/community/langchain_community/utilities/sql_database.py (L72-L77)` - [langchain_community.utilities.SQLDatabase] [fix] Convert table names to list for compatibility in SQLDatabase: - Description: When the SQLDatabase is initialized, it runs a code `self._inspector.get_table_names(schema=schema)` which expects an output of list. However, with some connectors (such as snowflake) the data type returned could be another iterable. This results in a type error when concatenating the table_names to view_names. I have added explicit type casting to prevent this. - Issue: The issue #29227 is being fixed here - Dependencies: None - Twitter handle: @BaqarAbbas2001 ## Additional Information When the following method is called for a Snowflake database: `44b41b699c/libs/community/langchain_community/utilities/sql_database.py (L75)` Snowflake under the hood calls: ```python from snowflake.sqlalchemy.snowdialect import SnowflakeDialect SnowflakeDialect.get_table_names ``` This method returns a `dict_keys()` object which is incompatible to concatenate with a list and results in a `TypeError` ### Relevant Library Versions - snowflake-sqlalchemy: 1.7.2 - snowflake-connector-python: 3.12.4 - sqlalchemy: 2.0.20 - langchain_community: 0.3.14	2025-01-15 10:00:03 -05:00
Jin Hyung Ahn	05554265b4	community: Fix ConfluenceLoader load() failure caused by deleted pages (#29232 ) ## Description This PR modifies the is_public_page function in ConfluenceLoader to prevent exceptions caused by deleted pages during the execution of ConfluenceLoader.process_pages(). Example scenario: Consider the following usage of ConfluenceLoader: ```python import os from langchain_community.document_loaders import ConfluenceLoader loader = ConfluenceLoader( url=os.getenv("BASE_URL"), token=os.getenv("TOKEN"), max_pages=1000, cql=f'type=page and lastmodified >= "2020-01-01 00:00"', include_restricted_content=False, ) # Raised Exception : HTTPError: Outdated version/old_draft/trashed? Cannot find content Please provide valid ContentId. documents = loader.load() ``` If a deleted page exists within the query result, the is_public_page function would previously raise an exception when calling get_all_restrictions_for_content, causing the loader.load() process to fail for all pages. By adding a pre-check for the page's "current" status, unnecessary API calls to get_all_restrictions_for_content for non-current pages are avoided. This fix ensures that such pages are skipped without affecting the rest of the loading process. ## Issue N/A (No specific issue number) ## Dependencies No new dependencies are introduced with this change. ## Twitter handle [@zenoengine](https://x.com/zenoengine)	2025-01-15 09:56:23 -05:00
Mohammad Mohtashim	21eb39dff0	[Community]: AzureOpenAIWhisperParser Authenication Fix (#29135 ) - Description: `AzureOpenAIWhisperParser` authentication fix as stated in the issue. - Issue: #29133	2025-01-15 09:44:53 -05:00
Erick Friis	b05543c69b	packages: disable mongodb for api docs (#29218 )	2025-01-15 02:23:01 +00:00
Erick Friis	30badd7a32	packages: update mongodb folder (#29217 )	2025-01-15 02:01:06 +00:00
pm390	76172511fd	community: Additional parameters for OpenAIAssistantV2Runnable (#29207 ) Description: Added Additional parameters that could be useful for usage of OpenAIAssistantV2Runnable. This change is thought to allow langchain users to set parameters that cannot be set using assistants UI (max_completion_tokens,max_prompt_tokens,parallel_tool_calls) and parameters that could be useful for experimenting like top_p and temperature. This PR originated from the need of using parallel_tool_calls in langchain, this parameter is very important in openAI assistants because without this parameter set to False strict mode is not respected by OpenAI Assistants (https://platform.openai.com/docs/guides/function-calling#parallel-function-calling). > Note: Currently, if the model calls multiple functions in one turn then strict mode will be disabled for those calls. Issue: None Dependencies: openai	2025-01-14 15:53:37 -05:00
Bagatur	4ab04ad6be	docs: oai api ref nit (#29210 )	2025-01-14 17:55:16 +00:00
Michael Chin	d9b856abad	community: Deprecate Amazon Neptune resources in langchain-community (#29191 ) Related: https://github.com/langchain-ai/langchain-aws/pull/322 The legacy `NeptuneOpenCypherQAChain` and `NeptuneSparqlQAChain` classes are being replaced by the new LCEL format chains `create_neptune_opencypher_qa_chain` and `create_neptune_sparql_qa_chain`, respectively, in the `langchain_aws` package. This PR adds deprecation warnings to all Neptune classes and functions that have been migrated to `langchain_aws`. All relevant documentation has also been updated to replace `langchain_community` usage with the new `langchain_aws` implementations. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-01-14 10:23:34 -05:00
Erick Friis	c55af44711	anthropic: pydantic mypy plugin (#29144 )	2025-01-13 15:32:40 -08:00
ccurme	1bf6576709	cli[patch]: fix anchor links in templates (#29178 ) These are outdated and can break docs builds.	2025-01-13 18:28:18 +00:00
Christopher Varjas	e156b372fb	langchain: support api key argument with OpenAI moderation chain (#29140 ) Description: Makes it possible to instantiate `OpenAIModerationChain` with an `openai_api_key` argument only and no `OPENAI_API_KEY` environment variable defined. Issue: https://github.com/langchain-ai/langchain/issues/25176 Dependencies: `openai` --------- Co-authored-by: ccurme <chester.curme@gmail.com>	2025-01-13 11:00:02 -05:00
Nikhil Shahi	335ca3a606	docs: add HyperbrowserLoader docs (#29143 ) ### Description This PR adds docs for the [langchain-hyperbrowser](https://pypi.org/project/langchain-hyperbrowser/) package. It includes a document loader that uses Hyperbrowser to scrape or crawl any urls and return formatted markdown or html content as well as relevant metadata. [Hyperbrowser](https://hyperbrowser.ai) is a platform for running and scaling headless browsers. It lets you launch and manage browser sessions at scale and provides easy to use solutions for any webscraping needs, such as scraping a single page or crawling an entire site. ### Issue None ### Dependencies None ### Twitter Handle `@hyperbrowser`	2025-01-13 10:45:39 -05:00
Tymon Żarski	689592f9bb	community: Fix rank-llm import paths for new 0.20.3 version (#29154 ) # PR title: "community: Fix rank-llm import paths for new 0.20.3 version" - The "community" package is being modified to handle updated import paths for the new `rank-llm` version. --- ## Description This PR updates the import paths for the `rank-llm` package to account for changes introduced in version `0.20.3`. The changes ensure compatibility with both pre- and post-revamp versions of `rank-llm`, specifically version `0.12.8`. Conditional imports are introduced based on the detected version of `rank-llm` to handle different path structures for `VicunaReranker`, `ZephyrReranker`, and `SafeOpenai`. ## Issue RankLLMRerank usage throws an error when used GPT (not only) when rank-llm version is > 0.12.8 - #29156 ## Dependencies This change relies on the `packaging` and `pkg_resources` libraries to handle version checks. ## Twitter handle @tymzar	2025-01-13 10:22:14 -05:00
Andrew	0e3115330d	Add additional_instructions on openai assistan runs create. (#29164 ) - Description: In the functions `_create_run` and `_acreate_run`, the parameters passed to the creation of `openai.resources.beta.threads.runs` were limited. Source: ``` def _create_run(self, input: dict) -> Any: params = { k: v for k, v in input.items() if k in ("instructions", "model", "tools", "run_metadata") } return self.client.beta.threads.runs.create( input["thread_id"], assistant_id=self.assistant_id, params, ) ``` - OpenAI Documentation ([createRun](https://platform.openai.com/docs/api-reference/runs/createRun)) - Full list of parameters `openai.resources.beta.threads.runs` ([source code](https://github.com/openai/openai-python/blob/main/src/openai/resources/beta/threads/runs/runs.py#L91)) - Issue: Fix #17574 - [x] Lint and test**: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Co-authored-by: ccurme <chester.curme@gmail.com>	2025-01-13 10:11:47 -05:00
ccurme	e4ceafa1c8	langchain[patch]: update extended tests for compatibility with langchain-openai==0.3 (#29174 )	2025-01-13 15:04:22 +00:00
Priyansh Agrawal	c115c09b6d	community: add missing format specifier in error log in CubeSemanticLoader (#29172 ) Thank you for contributing to LangChain! - [x] PR title: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [x] PR message - Description: Add a missing format specifier in an an error log in `langchain_community.document_loaders.CubeSemanticLoader` - Issue: raises `TypeError: not all arguments converted during string formatting` - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.	2025-01-13 09:32:57 -05:00
ThomasSaulou	349b5c91c2	fix chatperplexity: remove 'stream' from params in _stream method (#29173 ) quick fix chatperplexity: remove 'stream' from params in _stream method	2025-01-13 09:31:37 -05:00
LIU Yuwei	f980144e9c	community: add init for unstructured file loader (#29101 ) ## Description Add `__init__` for unstructured loader of epub/image/markdown/pdf/ppt/word to restrict the input type to `str` or `Path`. In the [signature](https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.markdown.UnstructuredMarkdownLoader.html) these unstructured loaders receive `file_path: str \| List[str] \| Path \| List[Path]`, but actually they only receive `str` or `Path`. ## Issue None ## Dependencies No changes.	2025-01-13 09:26:00 -05:00
Erick Friis	bbc3e3b2cf	openai: disable streaming for o1 by default (#29147 ) Currently 400s https://community.openai.com/t/streaming-support-for-o1-o1-2024-12-17-resulting-in-400-unsupported-value/1085043 o1-mini and o1-preview stream fine	2025-01-11 02:24:11 +00:00
Isaac Francisco	62074bac60	replace all LANGCHAIN_ flags with LANGSMITH_ flags (#29120 )	2025-01-11 01:24:40 +00:00
Bagatur	5c2fbb5b86	docs: Update openai README.md (#29146 )	2025-01-10 17:24:16 -08:00
Erick Friis	0a54aedb85	anthropic: pdf integration test (#29142 )	2025-01-10 21:56:31 +00:00
ccurme	8de8519daf	tests[patch]: release 0.3.8 (#29141 )	2025-01-10 21:53:41 +00:00
Jiang	7d3fb21807	Add lindorm as new integration (#29123 ) Misoperation caused the pr close: [origin pr link](https://github.com/langchain-ai/langchain/pull/29085) --------- Co-authored-by: jiangzhijie <jiangzhijie.jzj@alibaba-inc.com>	2025-01-10 16:30:37 -05:00
ccurme	4819b500e8	pinecone[patch]: release 0.2.2 (#29139 )	2025-01-10 14:59:57 -05:00
Ashvin	46fd09ffeb	partner: Update aiohttp in langchain pinecone. (#28863 ) - partner: "Update Aiohttp for resolving vulnerability issue" - Description: I have updated the upper limit of aiohttp from `3.10` to `3.10.5` in the pyproject.toml file of langchain-pinecone. Hopefully this will resolve #28771 . Please review this as I'm quite unsure. --------- Co-authored-by: = <=> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-01-10 14:54:52 -05:00
ccurme	f3d370753f	xai[minor]: release 0.2 (#29132 ) Update `langchain-openai` to 0.3. See [release notes](https://github.com/langchain-ai/langchain/releases/tag/langchain-openai%3D%3D0.3.0) for details. Should only impact default values of `temperature`, `n`, and `max_retries`.	2025-01-10 11:47:27 -05:00
ccurme	6e63ccba84	openai[minor]: release 0.3 (#29100 ) ## Goal Solve the following problems with `langchain-openai`: - Structured output with `o1` [breaks out of the box](https://langchain.slack.com/archives/C050X0VTN56/p1735232400232099). - `with_structured_output` by default does not use OpenAI’s [structured output feature](https://platform.openai.com/docs/guides/structured-outputs). - We override API defaults for temperature and other parameters. ## Breaking changes: - Default method for structured output is changing to OpenAI’s dedicated [structured output feature](https://platform.openai.com/docs/guides/structured-outputs). For schemas specified via TypedDict or JSON schema, strict schema validation is disabled by default but can be enabled by specifying `strict=True`. - To recover previous default, pass `method="function_calling"` into `with_structured_output`. - Models that don’t support `method="json_schema"` (e.g., `gpt-4` and `gpt-3.5-turbo`, currently the default model for ChatOpenAI) will raise an error unless `method` is explicitly specified. - To recover previous default, pass `method="function_calling"` into `with_structured_output`. - Schemas specified via Pydantic `BaseModel` that have fields with non-null defaults or metadata (like min/max constraints) will raise an error. - To recover previous default, pass `method="function_calling"` into `with_structured_output`. - `strict` now defaults to False for `method="json_schema"` when schemas are specified via TypedDict or JSON schema. - To recover previous behavior, use `with_structured_output(schema, strict=True)` - Schemas specified via Pydantic V1 will raise a warning (and use `method="function_calling"`) unless `method` is explicitly specified. - To remove the warning, pass `method="function_calling"` into `with_structured_output`. - Streaming with default structured output method / Pydantic schema no longer generates intermediate streamed chunks. - To recover previous behavior, pass `method="function_calling"` into `with_structured_output`. - We no longer override default temperature (was 0.7 in LangChain, now will follow OpenAI, currently 1.0). - To recover previous behavior, initialize `ChatOpenAI` or `AzureChatOpenAI` with `temperature=0.7`. - Note: conceptually there is a difference between forcing a tool call and forcing a response format. Tool calls may have more concise arguments vs. generating content adhering to a schema. Prompts may need to be adjusted to recover desired behavior. --------- Co-authored-by: Jacob Lee <jacoblee93@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2025-01-10 10:50:32 -05:00
ccurme	815bfa1913	openai[patch]: support streaming with json_schema response format (#29044 ) - Stream JSON string content. Final chunk includes parsed representation (following OpenAI [docs](https://platform.openai.com/docs/guides/structured-outputs#streaming)). - Mildly (?) breaking change: if you were using streaming with `response_format` before, usage metadata will disappear unless you set `stream_usage=True`. ## Response format Before: ![Screenshot 2025-01-06 at 11 59 01 AM](https://github.com/user-attachments/assets/e54753f7-47d5-421d-b8f3-172f32b3364d) After: ![Screenshot 2025-01-06 at 11 58 13 AM](https://github.com/user-attachments/assets/34882c6c-2284-45b4-92f7-5b5b69896903) ## with_structured_output For pydantic output, behavior of `with_structured_output` is unchanged (except for warning disappearing), because we pluck the parsed representation straight from OpenAI, and OpenAI doesn't return it until the stream is completed. Open to alternatives (e.g., parsing from content or intermediate dict chunks generated by OpenAI). Before: ![Screenshot 2025-01-06 at 12 38 11 PM](https://github.com/user-attachments/assets/913d320d-f49e-4cbb-a800-b394ae817fd1) After: ![Screenshot 2025-01-06 at 12 38 58 PM](https://github.com/user-attachments/assets/f7a45dd6-d886-48a6-8d76-d0e21ca767c6)	2025-01-09 10:32:30 -05:00
Panos Vagenas	858f655a25	docs: add Docling loader docs (#29104 ) ### Description This adds the docs for the Docling document loader. [Docling](https://github.com/DS4SD/docling) parses PDF, DOCX, PPTX, HTML, and other formats into a rich unified representation including document layout, tables etc., making them ready for generative AI workflows like RAG. Some references: - https://research.ibm.com/blog/docling-generative-AI - https://www.redhat.com/en/blog/docling-missing-document-processing-companion-generative-ai - [Docling Technical Report](https://arxiv.org/abs/2408.09869) The introduced `DoclingLoader` enables users to: - use various document types in their LLM applications with ease and speed, and - leverage Docling's rich representation for advanced, document-native grounding. ### Issue Replacing PR #27987 as discussed with @efriis [here](https://github.com/langchain-ai/langchain/pull/27987#issuecomment-2489354930). ### Dependencies None --------- Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>	2025-01-09 10:15:35 -05:00
Joshua Campbell	00dcc44739	Langchain_community: Fix issue with missing backticks in arango client (#29110 ) - Description: Adds backticks to generate_schema function in the arango graph client - Issue: We experienced an issue with the generate schema function when talking to our arango database where these backticks were missing - Dependencies: none - Twitter handle: @anangelofgrace	2025-01-09 10:00:10 -05:00
LIU Yuwei	2b09f798e1	community: add init for `UnstructuredHTMLLoader` to solve pathlib paths (#29091 ) ## Description Add `__init__` for `UnstructuredHTMLLoader` to restrict the input type to `str` or `Path`, and transfer the `self.file_path` to `str` just like `UnstructuredXMLLoader` does. ## Issue Fix #29090 ## Dependencies No changes.	2025-01-08 10:19:27 -05:00
Jin Hyung Ahn	c8ca1cd42f	community: fix "confluence-loader" enable include_labels for documents loaded via CQL (#29089 ) ## Description This PR enables label inclusion for documents loaded via CQL in the confluence-loader. - Updated _lazy_load to pass the include_labels parameter instead of False in process_pages calls for documents loaded via CQL. - Ensured that labels can now be fetched and added to the metadata for documents queried with cql. ## Related Modification History This PR builds on the previous functionality introduced in [#28259](https://github.com/langchain-ai/langchain/pull/28259), which added support for including labels with the include_labels option. However, this functionality did not work as expected for CQL queries, and this PR fixes that issue. If the False handling was intentional due to another issue, please let me know. I have verified with our Confluence instance that this change allows labels to be correctly fetched for documents loaded via CQL. ## Issue Fixes #29088 ## Dependencies No changes. ## Twitter Handle [@zenoengine](https://x.com/zenoengine)	2025-01-08 10:16:39 -05:00
Inah Jeon	9d290abccd	partner: Update Upstage Model Names and Remove Deprecated Model (#29093 ) This PR updates model names in the upstage library to reflect the latest naming conventions and removes deprecated models. Changes: Renamed Models: - `solar-1-mini-chat` -> `solar-mini` - `solar-1-mini-embedding-query` -> `embedding-query` Removed Deprecated Models: - `layout-analysis` (replaced to `document-parse`) Reference: - https://console.upstage.ai/docs/getting-started/overview - https://github.com/langchain-ai/langchain-upstage/releases/tag/libs%2Fupstage%2Fv0.5.0 Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.	2025-01-08 10:13:22 -05:00
Prashanth Rao	b1dafaef9b	Kùzu package integration docs (#29076 ) ## Langchain Kùzu ### Description This PR adds docs for the `langchain-kuzu` package [on PyPI](https://pypi.org/project/langchain-kuzu/) that was recently published, allowing Kùzu users to more easily use and work with LangChain QA chains. The package will also make it easier for the Kùzu team to continue supporting and updating the integration over future releases. ### Twitter Handle Please tag [@kuzudb](https://x.com/kuzudb) on Twitter once this PR is merged, so LangChain users can be notified! --------- Co-authored-by: Erick Friis <erickfriis@gmail.com>	2025-01-08 01:14:00 +00:00
Erick Friis	cc0f81f40f	partners/groq: release 0.2.3 (#29081 )	2025-01-07 23:36:51 +00:00
Erick Friis	fcc9cdd100	multiple: disable socket for unit tests (#29080 )	2025-01-07 15:31:50 -08:00
Erick Friis	539ebd5431	groq: user agent (#29079 )	2025-01-07 23:21:57 +00:00
Erick Friis	c5bee0a544	pinecone: bump core version (#29077 )	2025-01-07 20:23:33 +00:00
Cory Waddingham	ce9e9f9314	pinecone: Review pinecone tests (#29073 ) Title: langchain-pinecone: improve test structure and async handling Description: This PR improves the test infrastructure for the langchain-pinecone package by: 1. Implementing LangChain's standard test patterns for embeddings 2. Adding comprehensive configuration testing 3. Improving async test coverage 4. Fixing integration test issues with namespaces and async markers The changes make the tests more robust, maintainable, and aligned with LangChain's testing standards while ensuring proper async behavior in the embeddings implementation. Key improvements: - Added standard EmbeddingsTests implementation - Split custom configuration tests into a separate test class - Added proper async test coverage with pytest-asyncio - Fixed namespace handling in vector store integration tests - Improved test organization and documentation Dependencies: None (uses existing test dependencies) Tests and Documentation: - ✅ Added standard test implementation following LangChain's patterns - ✅ Added comprehensive unit tests for configuration and async behavior - ✅ All tests passing locally - No documentation changes needed (internal test improvements only) Twitter handle: N/A --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2025-01-07 11:46:30 -08:00
Philippe PRADOS	2921597c71	community[patch]: Refactoring PDF loaders: 01 prepare (#29062 ) - Refactoring PDF loaders step 1: "community: Refactoring PDF loaders to standardize approaches" - Description: Declare CloudBlobLoader in __init__.py. file_path is Union[str, PurePath] anywhere - Twitter handle: pprados This is one part of a larger Pull Request (PR) that is too large to be submitted all at once. This specific part focuses to prepare the update of all parsers. For more details, see [PR 28970](https://github.com/langchain-ai/langchain/pull/28970). @eyurtsev it's the start of a PR series.	2025-01-07 11:00:04 -05:00
ccurme	55677e31f7	text-splitters[patch]: release 0.3.5 (#29054 ) Resolves https://github.com/langchain-ai/langchain/issues/29053	2025-01-07 09:48:26 -05:00
Erick Friis	187131c55c	Revert "integrations[patch]: remove non-required chat param defaults" (#29048 ) Reverts langchain-ai/langchain#26730 discuss best way to release default changes (esp openai temperature)	2025-01-06 14:45:34 -08:00
Bagatur	3d7ae8b5d2	integrations[patch]: remove non-required chat param defaults (#26730 ) anthropic: - max_retries openai: - n - temperature - max_retries fireworks - temperature groq - n - max_retries - temperature mistral - max_retries - timeout - max_concurrent_requests - temperature - top_p - safe_mode --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2025-01-06 22:26:22 +00:00
UV	b9db8e9921	DOC: Improve human input prompt in FewShotChatMessagePromptTemplate example (#29023 ) Fixes #29010 This PR updates the example for FewShotChatMessagePromptTemplate by modifying the human input prompt to include a more descriptive and user-friendly question format ('What is {input}?') instead of just '{input}'. This change enhances clarity and usability in the documentation example. Co-authored-by: Erick Friis <erick@langchain.dev>	2025-01-06 12:29:15 -08:00
ccurme	1f78d4faf4	voyageai[patch]: release 0.1.4 (#29046 )	2025-01-06 20:20:19 +00:00
Eugene Evstafiev	6a152ce245	docs: add langchain-pull-md Markdown loader (#29024 ) - [x] PR title: "docs: add langchain-pull-md Markdown loader" - [x] PR message: - Description: This PR introduces the `langchain-pull-md` package to the LangChain community. It includes a new document loader that utilizes the pull.md service to convert URLs into Markdown format, particularly useful for handling web pages rendered with JavaScript frameworks like React, Angular, or Vue.js. This loader helps in efficient and reliable Markdown conversion directly from URLs without local rendering, reducing server load. - Issue: NA - Dependencies: requests >=2.25.1 - Twitter handle: https://x.com/eugeneevstafev?s=21 - [x] Add tests and docs: 1. Added unit tests to verify URL checking and conversion functionalities. 2. Created a comprehensive example notebook detailing the usage of the new loader. - [x] Lint and test: - Completed local testing using `make format`, `make lint`, and `make test` commands as per the LangChain contribution guidelines. Related Links: - [Package Repository](https://github.com/chigwell/langchain-pull-md) - [PyPI Package](https://pypi.org/project/langchain-pull-md/) --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2025-01-06 19:32:43 +00:00
Ashvin	20a715a103	community: Fix redundancy in code. (#29022 ) In my previous PR (#28953), I added an unwanted condition for validating the Azure ML Endpoint. In this PR, I have rectified the issue.	2025-01-06 12:58:16 -05:00
Adrián Panella	acddfc772e	core: allow artifact in create_retriever_tool (#28903 ) Add option to return content and artifacts, to also be able to access the full info of the retrieved documents. They are returned as a list of dicts in the `artifacts` property if parameter `response_format` is set to `"content_and_artifact"`. Defaults to `"content"` to keep current behavior. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2025-01-03 22:10:31 +00:00
ccurme	3e618b16cd	community[patch]: release 0.3.14 (#29019 )	2025-01-03 15:34:24 -05:00
ccurme	18eb9c249d	langchain[patch]: release 0.3.14 (#29018 )	2025-01-03 15:15:44 -05:00
ccurme	8e50e4288c	core[patch]: release 0.3.29 (#29017 )	2025-01-03 14:58:39 -05:00
ccurme	85403bfa99	core[patch]: substantially speed up @deprecated (#29016 ) Resolves https://github.com/langchain-ai/langchain/issues/26918 Unit tests don't raise any additional `LangChainDeprecationWarning`. Would like guidance on how to test this more thoroughly if needed. Note: speed up for `bind_tools` path is shown below. This is redundant with the speedup in https://github.com/langchain-ai/langchain/pull/29015. I include it for demonstration purposes. Before: ![Screenshot 2025-01-03 at 12 54 50 PM](https://github.com/user-attachments/assets/87f289eb-4cad-4304-85f7-5c58c59080f1) After: ![Screenshot 2025-01-03 at 12 55 35 PM](https://github.com/user-attachments/assets/95ad0506-e1d1-4c5c-bb27-6a634d8810c9)	2025-01-03 14:38:53 -05:00
ccurme	4bb391fd4e	core[patch]: remove deprecated functions from tool binding hotpath (#29015 ) (Inspired by https://github.com/langchain-ai/langchain/issues/26918) We rely on some deprecated public functions in the hot path for tool binding (`convert_pydantic_to_openai_function`, `convert_python_function_to_openai_function`, and `format_tool_to_openai_function`). My understanding is that what is deprecated is not the functionality they implement, but use of them in the public API -- we expect to continue to rely on them. Here we update these functions to be private and not deprecated. We keep the public, deprecated functions as simple wrappers that can be safely deleted. The `@deprecated` wrapper adds considerable latency due to its use of the `inspect` module. This update speeds up `bind_tools` by a factor of ~100x: Before: ![Screenshot 2025-01-03 at 11 22 55 AM](https://github.com/user-attachments/assets/94b1c433-ce12-406f-b64c-ca7103badfe0) After: ![Screenshot 2025-01-03 at 11 23 41 AM](https://github.com/user-attachments/assets/02d0deab-82e4-45ca-8cc7-a20b91a5b5db) --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2025-01-03 19:29:01 +00:00
Eugene Evstafiev	a86904e735	docs: fix typo (#29012 ) Thank you for contributing to LangChain! - [x] PR title: "docs: fix typo" - [x] PR message: *Delete this entire checklist* and replace with - Description: a minor fix of typo - Issue: NA - Dependencies: NA - Twitter handle: NA - [x] Add tests and docs: If you're adding a new integration, please include 1. ~~a test for the integration, preferably unit tests that do not rely on network access,~~ 2. ~~an example notebook showing its use. It lives in `docs/docs/integrations` directory.~~ - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/	2025-01-03 09:52:24 -08:00
Erick Friis	919d1c7da6	box: remove box readme for api docs build (#29014 )	2025-01-03 09:50:04 -08:00
Erick Friis	d8bc556c94	packages: update box location (#29013 )	2025-01-03 09:45:13 -08:00
Amaan	8d7daa59fb	docs: add langchain dappier retriever integration notebooks (#28931 ) Add a retriever to interact with Dappier APIs with an example notebook. The retriever can be invoked with: ```python from langchain_dappier import DappierRetriever retriever = DappierRetriever( data_model_id="dm_01jagy9nqaeer9hxx8z1sk1jx6", k=5 ) retriever.invoke("latest tech news") ``` To retrieve 5 documents related to latest news in the tech sector. The included notebook also includes deeper details about controlling filters such as selecting a data model, number of documents to return, site domain reference, minimum articles from the reference domain, and search algorithm, as well as including the retriever in a chain. The integration package can be found over here - https://github.com/DappierAI/langchain-dappier	2025-01-03 10:21:41 -05:00
ccurme	0185010b88	community[patch]: additional check for prompt caching support (#29008 ) Prompt caching explicitly excludes `gpt-4o-2024-05-13`: https://platform.openai.com/docs/guides/prompt-caching Resolves https://github.com/langchain-ai/langchain/issues/28997	2025-01-03 10:14:07 -05:00
Tari Yekorogha	ba9dfd9252	docs: Add FalkorDB Chat Message History and Update Package Registry (#28914 ) This commit updates the documentation and package registry for the FalkorDB Chat Message History integration. Changes: - Added a comprehensive example notebook falkordb_chat_message_history.ipynb demonstrating how to use FalkorDB for session-based chat message storage. - Added a provider notebook for FalkorDB - Updated libs/packages.yml to register FalkorDB as an integration package, following LangChain's new guidelines for community integrations. Notes: - This update aligns with LangChain's process for registering new integrations via documentation updates and package registry modifications. - No functional or core package changes were made in this commit. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-01-02 15:46:47 -05:00
Ashvin	d26c102a5a	community: Update azureml endpoint (#28953 ) - In this PR, I have updated the AzureML Endpoint with the latest endpoint. - Description: I have changed the existing `/chat/completions` to `/models/chat/completions` in libs/community/langchain_community/llms/azureml_endpoint.py - Issue: #25702 --------- Co-authored-by: = <=>	2025-01-02 14:47:02 -05:00
ccurme	7c28321f04	core[patch]: fix deprecation admonition in API ref (#28992 ) Before: ![Screenshot 2025-01-02 at 1 49 30 PM](https://github.com/user-attachments/assets/cb30526a-fc0b-439f-96d1-962c226d9dc7) After: ![Screenshot 2025-01-02 at 1 49 38 PM](https://github.com/user-attachments/assets/32c747ea-6391-4dec-b778-df457695d197)	2025-01-02 14:37:55 -05:00
Mohammad Mohtashim	0e74757b0a	(Community): `DuckDuckGoSearchAPIWrapper` backend changed from `api` to `auto` (#28961 ) - Description: `DuckDuckGoSearchAPIWrapper` default value for backend has been changed to avoid User Warning - Issue: #28957	2025-01-02 14:08:22 -05:00
Mohammad Mohtashim	aa551cbcee	(Core) Small Change in Docstring for method `partial` for `BasePromptTemplate` (#28969 ) - Description: Very small change in Docstring for `BasePromptTemplate` - Issue: #28966	2025-01-02 12:16:30 -05:00
minpeter	a873e0fbfb	community: update documentation and model IDs for FriendliAI provider (#28984 ) ### Description - In the example, remove `llama-2-13b-chat`, `mixtral-8x7b-instruct-v0-1`. - Fix llm friendli streaming implementation. - Update examples in documentation and remove duplicates. ### Issue N/A ### Dependencies None ### Twitter handle `@friendliai`	2025-01-02 12:15:59 -05:00
Hrishikesh Kalola	437ec53e29	langchain.agents: corrected documentation (#28986 ) Description: This PR updates the codebase to reflect the deprecation of the AgentType feature. It includes the following changes: Documentation Update: Added a deprecation notice to the AgentType class comment. Provided a reference to the official LangChain migration guide for transitioning to LangGraph agents. Reference Link: https://python.langchain.com/docs/how_to/migrate_agent/ Twitter handle: @hrrrriiiishhhhh --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-01-02 12:13:42 -05:00
Mohammad Mohtashim	49a26c1fca	(Community): Fix Keyword argument for `AzureAIDocumentIntelligenceParser` (#28959 ) - Description: Fix the `body` keyword argument for AzureAIDocumentIntelligenceParser` - Issue: #28948	2025-01-02 11:27:12 -05:00
ccurme	efc687a13b	community[patch]: fix instantiation for Slack tools (#28990 ) Believe the current implementation raises PydanticUserError following [this](https://github.com/pydantic/pydantic/releases/tag/v2.10.1) Pydantic release. Resolves https://github.com/langchain-ai/langchain/issues/28989	2025-01-02 16:14:17 +00:00
Yunlin Mao	c59093d67f	docs: add modelscope endpoint (#28941 ) ## Description To integrate ModelScope inference API endpoints for both Embeddings, LLMs and ChatModels, install the package `langchain-modelscope-integration` (as discussed in issue #28928 ). This is necessary because the package name `langchain-modelscope` was already registered by another party. ModelScope is a premier platform designed to connect model checkpoints with model applications. It provides the necessary infrastructure to share open models and promote model-centric development. For more information, visit GitHub page: [ModelScope](https://github.com/modelscope).	2025-01-02 10:08:41 -05:00
Bagatur	1c797ac68f	infra: speed up unit tests (#28974 ) Co-authored-by: Erick Friis <erick@langchain.dev>	2025-01-02 04:13:08 +00:00
Morgante Pell	79fc9b6b04	cli: bump gritql version (#28981 ) Description: bump gritql dependency, to use new binary names from [here](https://github.com/getgrit/gritql/pull/565) Issue: fixes https://github.com/langchain-ai/langchain/issues/27822	2025-01-01 20:02:46 -08:00
Bagatur	edbe7d5f5e	core,anthropic[patch]: fix with_structured_output typing (#28950 )	2024-12-28 15:46:51 -05:00
dabzr	ffbe5b2106	partners: fix default value for stop_sequences in ChatGroq (#28924 ) - Description: This PR addresses an issue with the `stop_sequences` field in the `ChatGroq` class. Currently, the field is defined as: ```python stop: Optional[Union[List[str], str]] = Field(None, alias="stop_sequences") ``` This causes the language server (LSP) to raise an error indicating that the `stop_sequences` parameter must be implemented. The issue occurs because `Field(None, alias="stop_sequences")` is different compared to `Field(default=None, alias="stop_sequences")`. ![image](https://github.com/user-attachments/assets/bfc34cb1-c664-4c31-b856-8f18419c7350) To resolve the issue, the field is updated to: ```python stop: Optional[Union[List[str], str]] = Field(default=None, alias="stop_sequences") ``` While this issue does not affect runtime behavior, it ensures compatibility with LSPs and improves the development experience. - Issue: N/A - Dependencies: None	2024-12-26 16:43:34 -05:00
Andy Wermke	5940ed3952	community: Fix error handling bug in ChatDeepInfra (#28918 ) In the async ClientResponse, `response.text` is not a string property, but an asynchronous function returning a string.	2024-12-26 14:45:12 -05:00
zep.hyr	7b4d2d5d44	Community : Add cost information for missing OpenAI model (#28882 ) In the previous commit, the cached model key for this model was omitted. When using the "gpt-4o-2024-11-20" model, the token count in the callback appeared as 0, and the cost was recorded as 0. We add model and cost information so that the token count and cost can be displayed for the respective model. - The message before modification is as follows. ``` Tokens Used: 0 Prompt Tokens: 0 Prompt Tokens Cached: 0 Completion Tokens: 0 Reasoning Tokens: 0 Successful Requests: 0 Total Cost (USD): $0.0 ``` - The message after modification is as follows. ``` Tokens Used: 3783 Prompt Tokens: 3625 Prompt Tokens Cached: 2560 Completion Tokens: 158 Reasoning Tokens: 0 Successful Requests: 1 Total Cost (USD): $0.010642500000000001 ```	2024-12-26 14:28:31 -05:00
Erick Friis	3726a944c0	docs: sorted by downloads [wip] (#28869 )	2024-12-23 13:13:35 -08:00
Andreas Motl	6352edf77f	docs: CrateDB: Register package `langchain-cratedb`, and add minimal "provider" documentation (#28877 ) Hi Erick. Coming back from a previous attempt, we now made a separate package for the CrateDB adapter, called `langchain-cratedb`, as advised. Other than registering the package within `libs/packages.yml`, this patch includes a minimal amount of documentation to accompany the advent of this new package. Let us know about any mistakes we made, or changes you would like to see. Thanks, Andreas. ## About - Description: Register a new database adapter package, `langchain-cratedb`, providing traditional vector store, document loader, and chat message history features for a start. - Addressed to: @efriis, @eyurtsev - References: GH-27710 - Preview: [Providers » More » CrateDB](https://langchain-git-fork-crate-workbench-register-la-4bf945-langchain.vercel.app/docs/integrations/providers/cratedb/) ## Status - PyPI: https://pypi.org/project/langchain-cratedb/ - GitHub: https://github.com/crate/langchain-cratedb - Documentation (CrateDB): https://cratedb.com/docs/guide/integrate/langchain/ - Documentation (LangChain): _This PR._ ## Backlog? Is this applicable for this kind of patch? > - [ ] Add tests and docs: If you're adding a new integration, please include > 1. a test for the integration, preferably unit tests that do not rely on network access, > 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. ## Q&A 1. Notebooks that use the LangChain CrateDB adapter are currently at [CrateDB LangChain Examples](https://github.com/crate/cratedb-examples/tree/main/topic/machine-learning/llm-langchain), and the documentation refers to them. Because they are derived from very old blueprints coming from LangChain 0.0.x times, we guess they need a refresh before adding them to `docs/docs/integrations`. Is it applicable to merge this minimal package registration + documentation patch, which already includes valid code snippets in `cratedb.mdx`, and add corresponding notebooks on behalf of a subsequent patch later? 2. How would it work getting into the tabular list of _Integration Packages_ enumerated on the [documentation entrypoint page about Providers](https://python.langchain.com/docs/integrations/providers/)? /cc Please also review, @ckurze, @wierdvanderhaar, @kneth, @simonprickett, if you can find the time. Thanks!	2024-12-23 10:55:44 -05:00
Wang Ran (汪然)	e5c9da3eb6	core[patch]: remove redundant imports (#28861 ) `Graph` has been imported at Line: 62	2024-12-23 10:31:23 -05:00
Adrián Panella	8d9907088b	community(azuresearch): allow to use any valid credential (#28873 ) Add option to use any valid credential type. Differentiates async cases needed by Azure Search. This could replace the use of a static token	2024-12-23 10:05:48 -05:00
Mohammad Mohtashim	41b6a86bbe	Community: LlamaCppEmbeddings `embed_documents` and `embed_query` (#28827 ) - Description: `embed_documents` and `embed_query` was throwing off the error as stated in the issue. The issue was that `Llama` client is returning the embeddings in a nested list which is not being accounted for in the current implementation and therefore the stated error is being raised. - Issue: #28813 --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-23 09:50:22 -05:00
Darien Schettler	32917a0b98	Update dataframe.py (#28871 ) community: optimize DataFrame document loader Description: Simplify the `lazy_load` method in the DataFrame document loader by combining text extraction and metadata cleanup into a single operation. This makes the code more concise while maintaining the same functionality. Issue: N/A Dependencies: None Twitter handle: N/A	2024-12-22 19:16:16 -05:00
yeounhak	f38fc89f35	community: Corrected aload func to be asynchronous from webBaseLoader (#28337 ) - Description: The aload function, contrary to its name, is not an asynchronous function, so it cannot work concurrently with other asynchronous functions. - Issue: #28336 - Test: : Done - Docs: [here](`e0a95e5646/docs/docs/integrations/document_loaders/web_base.ipynb (L201)`) - Lint: All checks passed If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-20 14:42:52 -05:00
Mohammad Mohtashim	8cf5f20bb5	`required` tool_choice added for ChatHuggingFace (#28851 ) - Description: HuggingFace Inference Client V3 now supports `required` as tool_choice which has been added. - Issue: #28842	2024-12-20 12:06:04 -05:00
Sylvain DEPARTE	fcba567a77	partners: allow to set Prefix in AIMessage (for MistralAI) (#28846 ) Description: Added ability to set `prefix` attribute to prevent error : ``` httpx.HTTPStatusError: Error response 400 while fetching https://api.mistral.ai/v1/chat/completions: {"object":"error","message":"Expected last role User or Tool (or Assistant with prefix True) for serving but got assistant","type":"invalid_request_error","param":null,"code":null} ``` Co-authored-by: Sylvain DEPARTE <sylvain.departe@wizbii.com>	2024-12-20 11:09:45 -05:00
Jacob Mansdorfer	6d81137325	community: adding langchain-predictionguard partner package documentation (#28832 ) - [x] PR title: "community: adding langchain-predictionguard partner package documentation" - [x] PR message: - Description: This PR adds documentation for the langchain-predictionguard package to main langchain repo, along with deprecating current Prediction Guard LLMs package. The LLMs package was previously broken, so I also updated it one final time to allow it to continue working from this point onward. . This enables users to chat with LLMs through the Prediction Guard ecosystem. - Package Links: - [PyPI](https://pypi.org/project/langchain-predictionguard/) - [Github Repo](https://www.github.com/predictionguard/langchain-predictionguard) - Issue: None - Dependencies: None - Twitter handle: [@predictionguard](https://x.com/predictionguard) - [x] Add tests and docs: All docs have been added for the partner package, and the current LLMs package test was updated to reflect changes. - [x] Lint and test: Linting tests are all passing. --------- Co-authored-by: ccurme <chester.curme@gmail.com>	2024-12-20 10:51:44 -05:00
ccurme	f0e858b4e3	core[patch]: release 0.3.28 (#28837 )	2024-12-19 17:52:32 -05:00
ccurme	137d1e9564	langchain[patch]: fix test following update to langchain-openai (#28838 )	2024-12-19 22:39:48 +00:00
Emmanuel Leroy	c8db5a19ce	langchain_community.chat_models.oci_generative_ai: Fix a bug when using optional parameters in tools (#28829 ) When using tools with optional parameters, the parameter `type` is not longer available since langchain update to 0.3 (because of the pydantic upgrade?) and there is now an `anyOf` field instead. This results in the `type` being `None` in the chat request for the tool parameter, and the LLM call fails with the error: ``` oci.exceptions.ServiceError: {'target_service': 'generative_ai_inference', 'status': 400, 'code': '400', 'opc-request-id': '...', 'message': 'Parameter definition must have a type.', 'operation_name': 'chat' ... } ``` Example code that fails: ``` from langchain_community.chat_models.oci_generative_ai import ChatOCIGenAI from langchain_core.tools import tool from typing import Optional llm = ChatOCIGenAI( model_id="cohere.command-r-plus", service_endpoint="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com", compartment_id="ocid1.compartment.oc1...", auth_profile="your_profile", auth_type="API_KEY", model_kwargs={"temperature": 0, "max_tokens": 3000}, ) @tool def test(example: Optional[str] = None): """This is the tool to use to test things Args: example: example variable, defaults to None """ return "this is a test" llm_with_tools = llm.bind_tools([test]) result = llm_with_tools.invoke("can you make a test for g") ``` This PR sets the param type to `any` in that case, and fixes the problem. Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-19 22:17:34 +00:00
Bagatur	c3ccd93c12	patch openai json mode test (#28831 )	2024-12-19 21:43:32 +00:00
Bagatur	ce6748dbfe	xfail openai image token count test (#28828 )	2024-12-19 21:23:30 +00:00
Anusha Karkhanis	26bdf40072	Langchain_Community: SQL LanguageParser (#28430 ) ## Description (This PR has contributions from @khushiDesai, @ashvini8, and @ssumaiyaahmed). This PR addresses Issue #11229 which addresses the need for SQL support in document parsing. This is integrated into the generic TreeSitter parsing library, allowing LangChain users to easily load codebases in SQL into smaller, manageable "documents." This pull request adds a new ```SQLSegmenter``` class, which provides the SQL integration. ## Issue Issue #11229: Add support for a variety of languages to LanguageParser ## Testing We created a file ```test_sql.py``` with several tests to ensure the ```SQLSegmenter``` is functional. Below are the tests we added: - ```def test_is_valid```: Checks SQL validity. - ```def test_extract_functions_classes```: Extracts individual SQL statements. - ```def test_simplify_code```: Simplifies SQL code with comments. --------- Co-authored-by: Syeda Sumaiya Ahmed <114104419+ssumaiyaahmed@users.noreply.github.com> Co-authored-by: ashvini hunagund <97271381+ashvini8@users.noreply.github.com> Co-authored-by: Khushi Desai <khushi.desai@advantawitty.com> Co-authored-by: Khushi Desai <59741309+khushiDesai@users.noreply.github.com> Co-authored-by: ccurme <chester.curme@gmail.com>	2024-12-19 20:30:57 +00:00
Bagatur	a7f2148061	openai[patch]: Release 0.2.14 (#28826 )	2024-12-19 11:56:44 -08:00
Bagatur	1378ddfa5f	openai[patch]: type reasoning_effort (#28825 )	2024-12-19 19:36:49 +00:00
Erick Friis	6a37899b39	core: dont mutate tool_kwargs during tool run (#28824 ) fixes https://github.com/langchain-ai/langchain/issues/24621	2024-12-19 18:11:56 +00:00
Qun	033ac41760	fix crash when using create_xml_agent with parameterless function as … (#26002 ) When using `create_xml_agent` or `create_json_chat_agent` to create a agent, and the function corresponding to the tool is a parameterless function, the `XMLAgentOutputParser` or `JSONAgentOutputParser` will parse the tool input into an empty string, `BaseTool` will parse it into a positional argument. So, the program will crash finally because we invoke a parameterless function but with a positional argument.Specially, below code will raise StopIteration in [_parse_input](https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/tools/base.py#L419) ```python from langchain import hub from langchain.agents import AgentExecutor, create_json_chat_agent, create_xml_agent from langchain_openai import ChatOpenAI prompt = hub.pull("hwchase17/react-chat-json") llm = ChatOpenAI() # agent = create_xml_agent(llm, tools, prompt) agent = create_json_chat_agent(llm, tools, prompt) agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True) agent_executor.invoke(......) ``` --------- Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-19 13:00:46 -05:00
Luke	f69695069d	text_splitters: Add HTMLSemanticPreservingSplitter (#25911 ) Description: With current HTML splitters, they rely on secondary use of the `RecursiveCharacterSplitter` to further chunk the document into manageable chunks. The issue with this is it fails to maintain important structures such as tables, lists, etc within HTML. This Implementation of a HTML splitter, allows the user to define a maximum chunk size, HTML elements to preserve in full, options to preserve `<a>` href links in the output and custom handlers. The core splitting begins with headers, similar to `HTMLHeaderSplitter`. If these sections exceed the length of the `max_chunk_size` further recursive splitting is triggered. During this splitting, elements listed to preserve, will be excluded from the splitting process. This can cause chunks to be slightly larger then the max size, depending on preserved length. However, all contextual relevance of the preserved item remains intact. Custom Handlers: Sometimes, companies such as Atlassian have custom HTML elements, that are not parsed by default with `BeautifulSoup`. Custom handlers allows a user to provide a function to be ran whenever a specific html tag is encountered. This allows the user to preserve and gather information within custom html tags that `bs4` will potentially miss during extraction. Dependencies: User will need to install `bs4` in their project to utilise this class I have also added in `how_to` and unit tests, which require `bs4` to run, otherwise they will be skipped. Flowchart of process: ![HTMLSemanticPreservingSplitter](https://github.com/user-attachments/assets/20873c36-22ed-4c80-884b-d3c6f433f5a7) --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-19 12:09:22 -05:00
Tommaso De Lorenzo	24bfa062bf	langchain: add support for Google Anthropic Vertex AI model garden provider in init_chat_model (#28177 ) Simple modification to add support for anthropic models deployed in Google Vertex AI model garden in `init_chat_model` importing `ChatAnthropicVertex` - [v] Lint and test	2024-12-19 12:06:21 -05:00
Erick Friis	ff7b01af88	anthropic: less pydantic for client (#28823 )	2024-12-19 08:00:02 -08:00
Erick Friis	f1d783748a	anthropic: sdk bump (#28820 )	2024-12-19 15:39:21 +00:00
Erick Friis	907f36a6e9	fireworks: fix lint (#28821 )	2024-12-19 15:36:36 +00:00
Erick Friis	6526db4871	community: bump core (#28819 )	2024-12-19 06:41:53 -08:00
Vignesh A	4c9acdfbf1	Community : Add OpenAI prompt caching and reasoning tokens tracking (#27135 ) Added Token tracking for OpenAI's prompt caching and reasoning tokens Costs updated from https://openai.com/api/pricing/ usage example ```python from langchain_community.callbacks import get_openai_callback from langchain_openai import ChatOpenAI llm = ChatOpenAI(model_name="o1-mini",temperature=1) with get_openai_callback() as cb: response = llm.invoke("hi "*1500) print(cb) ``` Output ``` Tokens Used: 1720 Prompt Tokens: 1508 Prompt Tokens Cached: 1408 Completion Tokens: 212 Reasoning Tokens: 192 Successful Requests: 1 Total Cost (USD): $0.0049559999999999995 ``` --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-19 09:31:13 -05:00
ScriptShi	97f1e1d39f	community: tablestore vector store check the dimension of the embedding when writing it to store. (#28812 ) Added some restrictions to a vectorstore I released in the community before.	2024-12-19 09:30:43 -05:00
Wang Ran (汪然)	f48755d35b	core: typo `Utilities for tests.` -> `Utilities for pydantic.` (#28814 ) Description: typo	2024-12-19 09:26:17 -05:00
Wang Ran (汪然)	51b8ddaf10	core: typo in runnable (#28815 ) Thank you for contributing to LangChain! Description: Typo	2024-12-19 09:25:57 -05:00
Erick Friis	3b036a1cf2	partners/fireworks: release 0.2.6 (#28805 )	2024-12-18 22:48:35 +00:00
Erick Friis	4eb8bf7793	partners/anthropic: release 0.3.1 (#28801 )	2024-12-18 22:45:38 +00:00
Lu Peng	50afa7c4e7	community: add new parameter default_headers (#28700 ) Thank you for contributing to LangChain! - [x] PR title: "package: description" - "community: 1. add new parameter `default_headers` for oci model deployments and oci chat model deployments. 2. updated k parameter in OCIModelDeploymentLLM class." - [x] PR message: - Description: 1. add new parameters `default_headers` for oci model deployments and oci chat model deployments. 2. updated k parameter in OCIModelDeploymentLLM class. - [x] Add tests and docs: 1. unit tests 2. notebook --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-18 22:33:23 +00:00
Erick Friis	cc616de509	partners/xai: release 0.1.1 (#28806 )	2024-12-18 22:15:24 +00:00
Erick Friis	ba8c1b0d8c	partners/groq: release 0.2.2 (#28804 )	2024-12-18 22:12:02 +00:00
Erick Friis	a119cae5bd	partners/mistralai: release 0.2.4 (#28803 )	2024-12-18 22:11:48 +00:00
Erick Friis	514d78516b	partners/ollama: release 0.2.2 (#28802 )	2024-12-18 22:11:08 +00:00
Bagatur	68940dd0d6	openai[patch]: Release 0.2.13 (#28800 )	2024-12-18 22:08:47 +00:00
Erick Friis	4dc28b43ac	community: release 0.3.13 (#28798 )	2024-12-18 21:58:46 +00:00
Bagatur	557f63c2e6	core[patch]: Release 0.3.27 (#28799 )	2024-12-18 21:58:03 +00:00
Bagatur	4a531437bb	core[patch], openai[patch]: Handle OpenAI developer msg (#28794 ) - Convert developer openai messages to SystemMessage - store additional_kwargs={"__openai_role__": "developer"} so that the correct role can be reconstructed if needed - update ChatOpenAI to read in openai_role --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-18 21:54:07 +00:00
Erick Friis	079f1d93ab	langchain: release 0.3.13 (#28797 )	2024-12-18 12:32:00 -08:00
Yuxin Chen	3256b5d6ae	text-splitters: fix state persistence issue in ExperimentalMarkdownSyntaxTextSplitter (#28373 ) - Description: This PR resolves an issue with the `ExperimentalMarkdownSyntaxTextSplitter` class, which retains the internal state across multiple calls to the `split_text` method. This behaviour caused an unintended accumulation of chunks in `self` variables, leading to incorrect outputs when processing multiple Markdown files sequentially. - Modified `libs\text-splitters\langchain_text_splitters\markdown.py` to reset the relevant internal attributes at the start of each `split_text` invocation. This ensures each call processes the input independently. - Added unit tests in `libs\text-splitters\tests\unit_tests\test_text_splitters.py` to verify the fix and ensure the state does not persist across calls. - Issue: Fixes [#26440](https://github.com/langchain-ai/langchain/issues/26440). - Dependencies: No additional dependencies are introduced with this change. - [x] Unit tests were added to verify the changes. - [x] Updated documentation where necessary. - [x] Ran `make format`, `make lint`, and `make test` to ensure compliance with project standards. --------- Co-authored-by: Angel Chen <angelchen396@gmail.com> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-18 20:27:59 +00:00
Mohammad Mohtashim	7c8f977695	Community: Fix `with_structured_output` for `ChatSambaNovaCloud` (#28796 ) - Description: The `kwargs` was being checked as None object which was causing the rest of code in `with_structured_output` not getting executed. The checking part has been fixed in this PR. - Issue: #28776	2024-12-18 14:35:06 -05:00
V.Prasanna kumar	684b146b18	Fixed adding float values into DynamoDB (#26562 ) Thank you for contributing to LangChain! - [x] PR title: Add float Message into Dynamo DB - community - Example: "community: Chat Message History - [x] PR message: - Description: pushing float values into dynamo db creates error , solved that by converting to str type - Issue: Float values are not getting pushed - Twitter handle: VpkPrasanna Have added an utility function for str conversion , let me know where to place it happy to do an commit. This PR is from an discussion of #26543 @hwchase17 @baskaryan @efriis --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-18 13:45:00 -05:00
William FH	50ea1c3ea3	[Core] respect tracing project name cvar (#28792 )	2024-12-18 10:02:02 -08:00
Martin Triska	e6b41d081d	community: DocumentLoaderAsParser wrapper (#27749 ) ## Description This pull request introduces the `DocumentLoaderAsParser` class, which acts as an adapter to transform document loaders into parsers within the LangChain framework. The class enables document loaders that accept a `file_path` parameter to be utilized as blob parsers. This is particularly useful for integrating various document loading capabilities seamlessly into the LangChain ecosystem. When merged in together with PR https://github.com/langchain-ai/langchain/pull/27716 It opens options for `SharePointLoader` / `OneDriveLoader` to process any filetype that has a document loader. ### Features - Flexible Parsing: The `DocumentLoaderAsParser` class can adapt any document loader that meets the criteria of accepting a `file_path` argument, allowing for lazy parsing of documents. - Compatibility: The class has been designed to work with various document loaders, making it versatile for different use cases. ### Usage Example To use the `DocumentLoaderAsParser`, you would initialize it with a suitable document loader class and any required parameters. Here’s an example of how to do this with the `UnstructuredExcelLoader`: ```python from langchain_community.document_loaders.blob_loaders import Blob from langchain_community.document_loaders.parsers.documentloader_adapter import DocumentLoaderAsParser from langchain_community.document_loaders.excel import UnstructuredExcelLoader # Initialize the parser adapter with UnstructuredExcelLoader xlsx_parser = DocumentLoaderAsParser(UnstructuredExcelLoader, mode="paged") # Use parser, for ex. pass it to MimeTypeBasedParser MimeTypeBasedParser( handlers={ "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet": xlsx_parser } ) ``` - Dependencies: None - Twitter handle: @martintriska1 If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-18 12:47:08 -05:00
Erick Friis	9b024d00c9	text-splitters: release 0.3.4 (#28795 )	2024-12-18 09:44:36 -08:00
Erick Friis	5cf965004c	core: release 0.3.26 (#28793 )	2024-12-18 17:28:42 +00:00
Mohammad Mohtashim	d49df4871d	[Community]: Image Extraction Fixed for `PDFPlumberParser` (#28491 ) - Description: One-Bit Images was raising error which has been fixed in this PR for `PDFPlumberParser` - Issue: #28480 --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-18 11:45:48 -05:00
binhnd102	f723a8456e	Fixes: community: fix LanceDB return no metadata (#27024 ) - [ x ] Fix when lancedb return table without metadata column - Description: Check the table schema, if not has metadata column, init the Document with metadata argument equal to empty dict - Issue: https://github.com/langchain-ai/langchain/issues/27005 - [ x ] Add tests and docs --------- Co-authored-by: ccurme <chester.curme@gmail.com>	2024-12-18 15:21:28 +00:00
ANSARI MD AAQIB AHMED	91d28ef453	Add langchain-yt-dlp Document Loader Documentation (#28775 ) ## Overview This PR adds documentation for the `langchain-yt-dlp` package, a YouTube document loader that uses `yt-dlp` for Youtube videos metadata extraaction. ## Changes - Added documentation notebook for YoutubeLoader - Updated packages.yml to include langchain-yt-dlp ## Motivation The existing LangChain YoutubeLoader was unable to fetch YouTube metadata due to changes in YouTube's structure. This package resolves those issues by leveraging the `yt-dlp` library. ## Features - Reliable YouTube metadata extraction ## Related - Package Repository: https://github.com/aqib0770/langchain-yt-dlp - PyPI Package: https://pypi.org/project/langchain-yt-dlp/ --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-18 10:16:50 -05:00
GITHUBear	33b1fb95b8	partners: langchain-oceanbase Integration (#28782 ) Hi, langchain team! I'm a maintainer of [OceanBase](https://github.com/oceanbase/oceanbase). With the integration guidance, I create a python lib named [langchain-oceanbase](https://github.com/oceanbase/langchain-oceanbase) to integrate `Oceanbase Vector Store` with `Langchain`. So I'd like to add the required docs. I will appreciate your feedback. Thank you! --------- Signed-off-by: shanhaikang.shk <shanhaikang.shk@oceanbase.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-18 14:51:49 +00:00
Rave Harpaz	986b752fc8	Add OCI Generative AI new model and structured output support (#28754 ) - [X] PR title: community: Add new model and structured output support - [X] PR message: - Description: add support for meta llama 3.2 image handling, and JSON mode for structured output - Issue: NA - Dependencies: NA - Twitter handle: NA - [x] Add tests and docs: 1. we have updated our unit tests, 2. no changes required for documentation. - [x] Lint and test: make format, make lint and make test we run successfully --------- Co-authored-by: Arthur Cheng <arthur.cheng@oracle.com> Co-authored-by: ccurme <chester.curme@gmail.com>	2024-12-18 09:50:25 -05:00
David Pryce-Compson	ef24220d3f	community: adding haiku 3.5 and opus callbacks (#28783 ) Description: Adding new AWS Bedrock model and their respective costs to match https://aws.amazon.com/bedrock/pricing/ for the Bedrock callback Issue: Missing models for those that wish to try them out Dependencies: Nothing added Twitter handle: @David_Pryce and / or @JamfSoftware If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.	2024-12-18 09:45:10 -05:00
Yudai Kotani	05a44797ee	langchain_community: Add default None values to DocumentAttributeValue class properties (#28785 ) Description: This PR addresses an issue where the DocumentAttributeValue class properties did not have default values of None. By explicitly setting the Optional attributes (DateValue, LongValue, StringListValue, and StringValue) to default to None, this change ensures the class functions as expected when no value is provided for these attributes. Changes Made: Added default None values to the following properties of the DocumentAttributeValue class: DateValue LongValue StringListValue StringValue Removed the invalid argument extra="allow" from the BaseModel inheritance. Dependencies: None. Twitter handle (optional): @__korikori1021 Checklist - [x] Verified that KendraRetriever works as expected after the changes. Co-authored-by: y1u0d2a1i <y.kotani@raksul.com>	2024-12-18 09:43:04 -05:00
Satyam Kumar	90f7713399	refactor: improve docstring parsing logic for Google style (#28730 ) Thank you for contributing to LangChain! - [x] PR title: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" Description: Improved the `_parse_google_docstring` function in `langchain/core` to support parsing multi-paragraph descriptions before the `Args:` section while maintaining compliance with Google-style docstring guidelines. This change ensures better handling of docstrings with detailed function descriptions. Issue: Fixes #28628 Dependencies: None. Twitter handle: @isatyamks --------- Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-18 09:35:19 -05:00
Dong Shin	0b1359801e	community: add trust_env at web_base_loader (#28514 ) - Description: I am working to address a similar issue to the one mentioned in https://github.com/langchain-ai/langchain/pull/19499. Specifically, there is a problem with the Webbase loader used in open-webui, where it fails to load the proxy configuration. This PR aims to resolve that issue. <!--If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.--> --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-17 21:18:16 -05:00
Erick Friis	be738aa7de	packages: enable vertex api build (#28773 )	2024-12-17 11:31:14 -08:00
Bagatur	ac278cbe8b	core[patch]: export InjectedToolCallId (#28772 )	2024-12-17 19:29:20 +00:00
Bagatur	e4d3ccf62f	json mode standard test (#25497 ) Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-17 18:47:34 +00:00
Frank Dai	e81433497b	community: support Confluence cookies (#28760 ) Description: Some confluence instances don't support personal access token, then cookie is a convenient way to authenticate. This PR adds support for Confluence cookies. Twitter handle: soulmachine	2024-12-17 12:16:36 -05:00
ccurme	b745281eec	anthropic[patch]: increase timeouts for integration tests (#28767 ) Some tests consistently ran into the 10s limit in CI.	2024-12-17 15:47:17 +00:00
Vinit Kudva	a00258ec12	chroma: fix persistence if client_settings is passed in (#25199 ) …ent path given. Thank you for contributing to LangChain! - [ ] PR title: "package: description" - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-17 10:03:02 -05:00
Omri Eliyahu Levy	f8883a1321	partners/voyageai: enable setting output dimension (#28740 ) Voyage has introduced voyage-3-large and voyage-code-3, which feature different output dimensions by leveraging a technique called "Matryoshka Embeddings" (see blog - https://blog.voyageai.com/2024/12/04/voyage-code-3/). These two models are available in various sizes: [256, 512, 1024, 2048] (https://docs.voyageai.com/docs/embeddings#model-choices). This PR adds the option to set the required output dimension.	2024-12-17 10:02:00 -05:00
German Martin	3a1d05394d	community: Apache AGE wrapper. Ensure Node Uniqueness by ID. (#28759 ) Description: The Apache AGE graph integration incorrectly handled node merging, allowing duplicate nodes with different IDs but the same type and other properties. Unlike [Neo4j](`cdf6202156/libs/community/langchain_community/graphs/neo4j_graph.py (L47)`), [Memgraph](`cdf6202156/libs/community/langchain_community/graphs/memgraph_graph.py (L50)`), [Kuzu](`cdf6202156/libs/community/langchain_community/graphs/kuzu_graph.py (L253)`), and [Gremlin](`cdf6202156/libs/community/langchain_community/graphs/gremlin_graph.py (L165)`), it did not use the node ID as the primary identifier for merging. This inconsistency caused data integrity issues and unexpected behavior when users expected updates to specific nodes by ID. Solution: This PR modifies the `node_insert_query` to `MERGE` nodes based on label and ID only and updates properties with `SET`, aligning the behavior with other graph database integrations. The `_format_properties` method was also modified to handle id overrides. Impact: This fix ensures data integrity by preventing duplicate nodes, and provides a consistent behavior across graph database integrations.	2024-12-17 09:21:59 -05:00
gsa9989	cdf6202156	cosmosdbnosql: Added Cosmos DB NoSQL Semantic Cache Integration with tests and jupyter notebook (#24424 ) * Added Cosmos DB NoSQL Semantic Cache Integration with tests and jupyter notebook --------- Co-authored-by: Aayush Kataria <aayushkataria3011@gmail.com> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-16 21:57:05 -05:00
Brian Burgin	27a9056725	community: Fix ChatLiteLLMRouter runtime issues (#28163 ) Description: Fix ChatLiteLLMRouter ctor validation and model_name parameter Issue: #19356, #27455, #28077 Twitter handle: @bburgin_0	2024-12-16 18:17:39 -05:00
Mikhail Khludnev	00deacc67e	docs, external: introduce `langchain-localai` (#28751 ) Thank you for contributing to LangChain! Referring to https://github.com/mkhludnev/langchain-localai --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-16 22:22:37 +00:00
Erick Friis	d4b5e7ef22	community: recommend RedisVectorStore over Redis (#28749 )	2024-12-16 21:08:30 +00:00
Hiros	8f5e72de05	community: Correctly handle multi-element rich text (#25762 ) Description: - Add _concatenate_rich_text method to combine all elements in rich text arrays - Update load_page method to use _concatenate_rich_text for rich text properties - Ensure all text content is captured, including inline code and formatted text - Add unit tests to verify correct handling of multi-element rich text This fix prevents truncation of content after backticks or other formatting elements. Issue: Using Notion DB Loader, the text for `richtext` and `title` is truncated after 1st element was loaded as Notion Loader only read the first element. Dependencies: any dependencies required for this change None. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-16 20:20:27 +00:00
Antonio Lanza	b2102b8cc4	text-splitters: Inconsistent results with `NLTKTextSplitter`'s `add_start_index=True` (#27782 ) This PR closes #27781 # Problem The current implementation of `NLTKTextSplitter` is using `sent_tokenize`. However, this `sent_tokenize` doesn't handle chars between 2 tokenized sentences... hence, this behavior throws errors when we are using `add_start_index=True`, as described in issue #27781. In particular: ```python from nltk.tokenize import sent_tokenize output1 = sent_tokenize("Innovation drives our success. Collaboration fosters creative solutions. Efficiency enhances data management.", language="english") print(output1) output2 = sent_tokenize("Innovation drives our success. Collaboration fosters creative solutions. Efficiency enhances data management.", language="english") print(output2) >>> ['Innovation drives our success.', 'Collaboration fosters creative solutions.', 'Efficiency enhances data management.'] >>> ['Innovation drives our success.', 'Collaboration fosters creative solutions.', 'Efficiency enhances data management.'] ``` # Solution With this new `use_span_tokenize` parameter, we can use NLTK to create sentences (with `span_tokenize`), but also add extra chars to be sure that we still can map the chunks to the original text. --------- Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Erick Friis <erickfriis@gmail.com>	2024-12-16 19:53:15 +00:00
Tari Yekorogha	d262d41cc0	community: added FalkorDB vector store support i.e implementation, test, docs an… (#26245 ) Description: Added support for FalkorDB Vector Store, including its implementation, unit tests, documentation, and an example notebook. The FalkorDB integration allows users to efficiently manage and query embeddings in a vector database, with relevance scoring and maximal marginal relevance search. The following components were implemented: - Core implementation for FalkorDBVector store. - Unit tests ensuring proper functionality and edge case coverage. - Example notebook demonstrating an end-to-end setup, search, and retrieval using FalkorDB. Twitter handle: @tariyekorogha --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-16 19:37:55 +00:00
Aaron Pham	12fced13f4	chore(community): update to OpenLLM 0.6 (#24609 ) Update to OpenLLM 0.6, which we decides to make use of OpenLLM's OpenAI-compatible endpoint. Thus, OpenLLM will now just become a thin wrapper around OpenAI wrapper. Signed-off-by: Aaron Pham <contact@aarnphm.xyz> --------- Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: ccurme <chester.curme@gmail.com>	2024-12-16 14:30:07 -05:00
Lvlvko	5c17a4ace9	community: support Hunyuan Embedding (#23160 ) ## description - I refactor `Chathunyuan` using tencentcloud sdk because I found the original one can't work in my application - I add `HunyuanEmbeddings` using tencentcloud sdk - Both of them are extend the basic class of langchain. I have fully tested them in my application ## Dependencies - tencentcloud-sdk-python --------- Co-authored-by: centonhuang <centonhuang@tencent.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-16 19:27:19 +00:00
Harrison Chase	de7996c2ca	core: add kwargs support to VectorStore (#25934 ) has been missing the passthrough until now --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-16 18:57:57 +00:00
Lorenzo	b79a1156ed	community: correct return type of get_files_from_directory in github tool (#27885 ) ### About: - Description: the _get_files_from_directory_ method return a string, but it's used in other methods that expect a List[str] - Issue: None - Dependencies: None This pull request import a new method _list_files_ with the old logic of _get_files_from_directory_, but it return a List[str] at the end. The behavior of _ get_files_from_directory_ is not changed.	2024-12-16 10:30:33 -08:00
Sheepsta300	580a8d53f9	community: Add configurable `VisualFeatures` to the `AzureAiServicesImageAnalysisTool` (#27444 ) Thank you for contributing to LangChain! - [ ] PR title: community: Add configurable `VisualFeatures` to the `AzureAiServicesImageAnalysisTool` - [ ] PR message: - Description: The `AzureAiServicesImageAnalysisTool` is a good service and utilises the Azure AI Vision package under the hood. However, since the creation of this tool, new `VisualFeatures` have been added to allow the user to request other image specific information to be returned. Currently, the tool offers neither configuration of which features should be return nor does it offer any newer feature types. The aim of this PR is to address this and expose more of the Azure Service in this integration. - Dependencies: no new dependencies in the main class file, azure.ai.vision.imageanalysis added to extra test dependencies file. - [ ] Add tests and docs: If you're adding a new integration, please include 1. Although no tests exist for already implemented Azure Service tools, I've created 3 unit tests for this class that test initialisation and credentials, local file analysis and a test for the new changes/ features option. - [ ] Lint and test: All linting has passed. --------- Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-16 18:30:04 +00:00
Erick Friis	1c120e9615	core: xml output parser tags docstring (#28745 )	2024-12-16 18:25:16 +00:00
Ana	ebab2ea81b	Fix Azure National Cloud authentication using token (RBAC) (Generated by Ana - AI SDE) (#25843 ) This pull request addresses the issue with authenticating Azure National Cloud using token (RBAC) in the AzureSearch vectorstore implementation. ## Changes - Modified the `_get_search_client` method in `azuresearch.py` to pass `additional_search_client_options` to the `SearchIndexClient` instance. ## Implementation Details The patch updates the `SearchIndexClient` initialization to include the `additional_search_client_options` parameter: ```python index_client: SearchIndexClient = SearchIndexClient( endpoint=endpoint, credential=credential, user_agent=user_agent, **additional_search_client_options ) ``` This change allows the `audience` parameter to be correctly passed when using Azure National Cloud, fixing the authentication issues with GovCloud & RBAC. This patch was generated by [Ana - AI SDE](https://openana.ai/), an AI-powered software development assistant. This is a fix for [Issue 25823](https://github.com/langchain-ai/langchain/issues/25823) --------- Co-authored-by: ccurme <chester.curme@gmail.com>	2024-12-16 18:22:24 +00:00
chenzimin	169d419581	community: Remove all other keys in ChatLiteLLM and add api_key (#28097 ) Thank you for contributing to LangChain! - PR title: "community: Remove all other keys in ChatLiteLLM and add api_key" - PR message: Currently, no api_key are passed to LiteLLM, and LiteLLM only takes on api_key parameter. Therefore I removed all current `*_api_key` attributes (They are not used), and added `api_key` that is passed to ChatLiteLLM. - Should fix issue #27826 --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-16 17:54:29 +00:00
German Martin	d5d18c62b3	community: Apache AGE wrapper additional edge cases. (#28151 ) Description: Current AGEGraph() implementation does some custom wrapping for graph queries. The method here is _wrap_query() as it parse the field from the original query to add some SQL context to it. This improves the current parsing logic to cover additional edge cases that are added to the test coverage, basically if any Node property name or value has the "return" literal in it will break the graph / SQL query. We discovered this while dealing with real world datasets, is not an uncommon scenario and I think it needs to be covered.	2024-12-16 11:28:01 -05:00
Rock2z	768e4a7fd4	[community][fix] Compatibility support to bump up wikibase-rest-api-client version (#27316 ) Description: This PR addresses the `TypeError: sequence item 0: expected str instance, FluentValue found` error when invoking `WikidataQueryRun`. The root cause was an incompatible version of the `wikibase-rest-api-client`, which caused the tool to fail when handling `FluentValue` objects instead of strings. The current implementation only supports `wikibase-rest-api-client<0.2`, but the latest version is `0.2.1`, where the current implementation breaks. Additionally, the error message advises users to install the latest version: [code reference](https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/utilities/wikidata.py#L125C25-L125C32). Therefore, this PR updates the tool to support the latest version of `wikibase-rest-api-client`. Key changes: - Updated the handling of `FluentValue` objects to ensure compatibility with the latest `wikibase-rest-api-client`. - Removed the restriction to `wikibase-rest-api-client<0.2` and updated to support the latest version (`0.2.1`). Issue: Fixes [#24093](https://github.com/langchain-ai/langchain/issues/24093) – `TypeError: sequence item 0: expected str instance, FluentValue found`. Dependencies: - Upgraded `wikibase-rest-api-client` to the latest version to resolve the issue. --------- Co-authored-by: peiwen_zhang <peiwen_zhang@email.com> Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-16 16:22:18 +00:00
André Quintino	a26c786bc5	community: refactor opensearch query constructor to use wildcard instead of match in the contain comparator (#26653 ) - Description: Changed the comparator to use a wildcard query instead of match. This modification allows for partial text matching on analyzed fields, which improves the flexibility of the search by performing full-text searches that aren't limited to exact matches. - Issue: The previous implementation used a match query, which performs exact matches on analyzed fields. This approach limited the search capabilities by requiring the query terms to align with the indexed text. The modification to use a wildcard query instead addresses this limitation. The wildcard query allows for partial text matching, which means the search can return results even if only a portion of the term matches the text. This makes the search more flexible and suitable for use cases where exact matches aren't necessary or expected, enabling broader full-text searches across analyzed fields. In short, the problem was that match queries were too restrictive, and the change to wildcard queries enhances the ability to perform partial matches. - Dependencies: none - Twitter handle: @Andre_Q_Pereira --------- Co-authored-by: André Quintino <andre.quintino@tui.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-16 11:16:34 -05:00
Davi Schumacher	0f9b4bf244	community[patch]: update dynamodb chat history to update instead of overwrite (#22397 ) Description: The current implementation of `DynamoDBChatMessageHistory` updates the `History` attribute for a given chat history record by first extracting the existing contents into memory, appending the new message, and then using the `put_item` method to put the record back. This has the effect of overwriting any additional attributes someone may want to include in the record, like chat session metadata. This PR suggests changing from using `put_item` to using `update_item` instead which will keep any other attributes in the record untouched. The change is backward compatible since 1. `update_item` is an "upsert" operation, creating the record if it doesn't already exist, otherwise updating it 2. It only touches the db insert call and passes the exact same information. The rest of the class is left untouched Dependencies: None Tests and docs: No unit tests currently exist for the `DynamoDBChatMessageHistory` class. This PR adds the file `libs/community/tests/unit_tests/chat_message_histories/test_dynamodb_chat_message_history.py` to test the `add_message` and `clear` methods. I wanted to use the moto library to mock DynamoDB calls but I could not get poetry to resolve it so I mocked those calls myself in the test. Therefore, no test dependencies were added. The change was tested on a test DynamoDB table as well. The first three images below show the current behavior. First a message is added to chat history, then a value is inserted in the record in some other attribute, and finally another message is added to the record, destroying the other attribute. ![using_put_1_first_message](https://github.com/langchain-ai/langchain/assets/29493541/426acd62-fe29-42f4-b75f-863fb8b3fb21) ![using_put_2_add_attribute](https://github.com/langchain-ai/langchain/assets/29493541/f8a1c864-7114-4fe3-b487-d6f9252f8f92) ![using_put_3_second_message](https://github.com/langchain-ai/langchain/assets/29493541/8b691e08-755e-4877-8969-0e9769e5d28a) The next three images show the new behavior. Once again a value is added to an attribute other than the History attribute, but now when the followup message is added it does not destroy that other attribute. The History attribute itself is unaffected by this change. ![using_update_1_first_message](https://github.com/langchain-ai/langchain/assets/29493541/3e0d76ed-637e-41cd-82c7-01a86c468634) ![using_update_2_add_attribute](https://github.com/langchain-ai/langchain/assets/29493541/52585f9b-71a2-43f0-9dfc-9935aa59c729) ![using_update_3_second_message](https://github.com/langchain-ai/langchain/assets/29493541/f94c8147-2d6f-407a-9a0f-86b94341abff) The doc located at `docs/docs/integrations/memory/aws_dynamodb.ipynb` required no changes and was tested as well.	2024-12-16 10:38:00 -05:00
Christophe Bornet	6ddd5dbb1e	community: Add FewShotSQLTool (#28232 ) The `FewShotSQLTool` gets some SQL query examples from a `BaseExampleSelector` for a given question. This is useful to provide [few-shot examples](https://python.langchain.com/docs/how_to/sql_prompting/#few-shot-examples) capability to an SQL agent. Example usage: ```python from langchain.agents.agent_toolkits.sql.prompt import SQL_PREFIX embeddings = OpenAIEmbeddings() example_selector = SemanticSimilarityExampleSelector.from_examples( examples, embeddings, AstraDB, k=5, input_keys=["input"], collection_name="lc_few_shots", token=ASTRA_DB_APPLICATION_TOKEN, api_endpoint=ASTRA_DB_API_ENDPOINT, ) few_shot_sql_tool = FewShotSQLTool( example_selector=example_selector, description="Input to this tool is the input question, output is a few SQL query examples related to the input question. Always use this tool before checking the query with sql_db_query_checker!" ) agent = create_sql_agent( llm=llm, db=db, prefix=SQL_PREFIX + "\nYou MUST get some example queries before creating the query.", extra_tools=[few_shot_sql_tool] ) result = agent.invoke({"input": "How many artists are there?"}) ``` --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-16 15:37:21 +00:00
Mohammad Mohtashim	8d746086ab	Added `bind_tools` support for `ChatMLX` along with small fix in `_stream` (#28743 ) - Description: Added Support for `bind_tool` as requested in the issue. Plus two issue in `_stream` were fixed: - Corrected the Positional Argument Passing for `generate_step` - Accountability if `token` returned by `generate_step` is integer. - Issue: #28692	2024-12-16 09:52:49 -05:00
Jorge Piedrahita Ortiz	558b65ea32	community: SamabaStudio Tool Calling and Structured Output (#28025 ) Description: Add tool calling and structured output support for SambaStudio chat models, docs included --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-16 06:15:19 +00:00
clairebehue	fb44e74ca4	community: fix AzureSearch Oauth with azure_ad_access_token (#26995 ) Description: AzureSearch vector store: create a wrapper class on `azure.core.credentials.TokenCredential` (which is not-instantiable) to fix Oauth usage with `azure_ad_access_token` argument Issue: [the issue it fixes](https://github.com/langchain-ai/langchain/issues/26216) Dependencies: None - [x] Lint and test --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-16 05:56:45 +00:00
SirSmokeAlot	29305cd948	community: O365Toolkit - send_event - fixed timezone error (#25876 ) Description: Fixed formatting start and end time Issue: The old formatting resulted everytime in an timezone error Dependencies: / Twitter handle: / --------- Co-authored-by: Yannick Opitz <yannick.opitz@gob.de> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-16 05:32:28 +00:00
Erick Friis	4f6ccb7080	text-splitters: extended-tests without socket (#28736 )	2024-12-16 05:19:50 +00:00
Erick Friis	8ec1c72e03	text-splitters: test without socket (#28732 )	2024-12-15 22:10:35 +00:00
Aayush Kataria	d417e4b372	Community: Azure CosmosDB No Sql Vector Store: Full Text and Hybrid Search Support (#28716 ) Thank you for contributing to LangChain! - Added [full text](https://learn.microsoft.com/en-us/azure/cosmos-db/gen-ai/full-text-search) and [hybrid search](https://learn.microsoft.com/en-us/azure/cosmos-db/gen-ai/hybrid-search) support for Azure CosmosDB NoSql Vector Store - Added a new enum called CosmosDBQueryType which supports the following values: - VECTOR = "vector" - FULL_TEXT_SEARCH = "full_text_search" - FULL_TEXT_RANK = "full_text_rank" - HYBRID = "hybrid" - User now needs to provide this query_type to the similarity_search method for the vectorStore to make the correct query api call. - Added a couple of work arounds as for the FULL_TEXT_RANK and HYBRID query functions we don't support parameterized queries right now. I have added TODO's in place, and will remove these work arounds by end of January. - Added necessary test cases and updated the - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <erickfriis@gmail.com>	2024-12-15 13:26:32 -08:00
Mohammad Mohtashim	4c1871d9a8	community: Passing the `model_kwargs` correctly while maintaing backward compatability (#28439 ) - Description: `Model_Kwargs` was not being passed correctly to `sentence_transformers.SentenceTransformer` which has been corrected while maintaing backward compatability - Issue: #28436 --------- Co-authored-by: MoosaTae <sadhis.tae@gmail.com> Co-authored-by: Sadit Wongprayon <101176694+MoosaTae@users.noreply.github.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-15 20:34:29 +00:00
nhols	a3851cb3bc	community: FAISS vectorstore - consistent Document id field (#28728 ) make sure id field of Documents in `FAISS` docstore have the same id as values in `index_to_docstore_id`, implement `get_by_ids` method	2024-12-15 12:23:49 -08:00
Bagatur	a0534ae62a	community[patch]: Release 0.3.12 (#28725 )	2024-12-14 22:13:20 +00:00
Bagatur	089e659e03	langchain[patch]: Release 0.3.12 (#28724 )	2024-12-14 20:02:18 +00:00
Bagatur	679e3a9970	text-splitters[patch]: Release 0.3.3 (#28723 )	2024-12-14 19:20:22 +00:00
Erick Friis	387284c259	core: release 0.3.25 (#28718 )	2024-12-14 02:22:28 +00:00
Nawaf Alharbi	decd77c515	community: fix an issue with deepinfra integration (#28715 ) Thank you for contributing to LangChain! - [x] PR title: langchain: add URL parameter to ChatDeepInfra class - [x] PR message: add URL parameter to ChatDeepInfra class - Description: This PR introduces a url parameter to the ChatDeepInfra class in LangChain, allowing users to specify a custom URL. Previously, the URL for the DeepInfra API was hardcoded to "https://stage.api.deepinfra.com/v1/openai/chat/completions", which caused issues when the staging endpoint was not functional. The _url method was updated to return the value from the url parameter, enabling greater flexibility and addressing the problem. out! --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-14 02:15:29 +00:00
Ben Chambers	008efada2c	[community]: Render documents to graphviz (#24830 ) - Description: Adds a helper that renders documents with the GraphVectorStore metadata fields to Graphviz for visualization. This is helpful for understanding and debugging. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-14 02:02:09 +00:00
Erick Friis	288f204758	docs, community: aerospike docs update (#28717 ) Co-authored-by: Jesse Schumacher <jschumacher@aerospike.com> Co-authored-by: Jesse S <jschmidt@aerospike.com> Co-authored-by: dylan <dwelch@aerospike.com>	2024-12-14 00:27:37 +00:00
Vimpas	337fed80a5	community: 🐛 PDF Filter Type Error (#27154 ) Thank you for contributing to LangChain! PR title: "community: fix PDF Filter Type Error" - Description: fix PDF Filter Type Error" - Issue: the issue #27153 it fixes, - Dependencies: no - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-13 23:30:29 +00:00
Ryan Parker	12111cb922	community: fallback on core async atransform_documents method for `MarkdownifyTransformer` (#27866 ) # Description Implements the `atransform_documents` method for `MarkdownifyTransformer` using the `asyncio` built-in library for concurrency. Note that this is mainly for API completeness when working with async frameworks rather than for performance, since the `markdownify` function is not I/O bound because it works with `Document` objects already in memory. # Issue Fixes #27865 # Dependencies No new dependencies added, but [`markdownify`](https://github.com/matthewwithanm/python-markdownify) is required since this PR updates the `markdownify` integration. # Tests and docs - Tests added - I did not modify the docstrings since they already described the basic functionality, and [the API docs also already included a description](https://python.langchain.com/api_reference/community/document_transformers/langchain_community.document_transformers.markdownify.MarkdownifyTransformer.html#langchain_community.document_transformers.markdownify.MarkdownifyTransformer.atransform_documents). If it would be helpful, I would be happy to update the docstrings and/or the API docs. # Lint and test - [x] format - [x] lint - [x] test I ran formatting with `make format`, linting with `make lint`, and confirmed that tests pass using `make test`. Note that some unit tests pass in CI but may fail when running `make_test`. Those unit tests are: - `test_extract_html` (and `test_extract_html_async`) - `test_strip_tags` (and `test_strip_tags_async`) - `test_convert_tags` (and `test_convert_tags_async`) The reason for the difference is that there are trailing spaces when the tests are run in the CI checks, and no trailing spaces when run with `make test`. I ensured that the tests pass in CI, but they may fail with `make test` due to the addition of trailing spaces. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-13 22:32:22 +00:00
Manuel	af2e0a7ede	partners: add 'model' alias for consistency in embedding classes (#28374 ) Description: This PR introduces a `model` alias for the embedding classes that contain the attribute `model_name`, to ensure consistency across the codebase, as suggested by a moderator in a previous PR. The change aligns the usage of attribute names across the project (see for example [here](`65deeddd5d/libs/partners/groq/langchain_groq/chat_models.py (L304)`)). Issue: This PR addresses the suggestion from the review of issue #28269. Dependencies: None --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-13 22:30:00 +00:00
Erick Friis	3107d78517	huggingface: fix standard test lint (#28714 )	2024-12-13 22:18:54 +00:00
Kaiwei Zhang	b909d54e70	chroma[patch]: Update logic for assigning ids	2024-12-13 21:58:34 +00:00
Karthik Bharadhwaj	498f0249e2	community[minor]: Opensearch hybridsearch implementation (#25375 ) community: add hybrid search in opensearch # Langchain OpenSearch Hybrid Search Implementation ## Implementation of Hybrid Search: I have taken LangChain's OpenSearch integration to the next level by adding hybrid search capabilities. Building on the existing OpenSearchVectorSearch class, I have implemented Hybrid Search functionality (which combines the best of both keyword and semantic search). This new functionality allows users to harness the power of OpenSearch's advanced hybrid search features without leaving the familiar LangChain ecosystem. By blending traditional text matching with vector-based similarity, the enhanced class delivers more accurate and contextually relevant results. It's designed to seamlessly fit into existing LangChain workflows, making it easy for developers to upgrade their search capabilities. In implementing the hybrid search for OpenSearch within the LangChain framework, I also incorporated filtering capabilities. It's important to note that according to the OpenSearch hybrid search documentation, only post-filtering is supported for hybrid queries. This means that the filtering is applied after the hybrid search results are obtained, rather than during the initial search process. Note: For the implementation of hybrid search, I strictly followed the official OpenSearch Hybrid search documentation and I took inspiration from https://github.com/AndreasThinks/langchain/tree/feature/opensearch_hybrid_search Thanks Mate! ### Experiments I conducted few experiments to verify that the hybrid search implementation is accurate and capable of reproducing the results of both plain keyword search and vector search. Experiment - 1 Hybrid Search Keyword_weight: 1, vector_weight: 0 I conducted an experiment to verify the accuracy of my hybrid search implementation by comparing it to a plain keyword search. For this test, I set the keyword_weight to 1 and the vector_weight to 0 in the hybrid search, effectively giving full weightage to the keyword component. The results from this hybrid search configuration matched those of a plain keyword search, confirming that my implementation can accurately reproduce keyword-only search results when needed. It's important to note that while the results were the same, the scores differed between the two methods. This difference is expected because the plain keyword search in OpenSearch uses the BM25 algorithm for scoring, whereas the hybrid search still performs both keyword and vector searches before normalizing the scores, even when the vector component is given zero weight. This experiment validates that my hybrid search solution correctly handles the keyword search component and properly applies the weighting system, demonstrating its accuracy and flexibility in emulating different search scenarios. Experiment - 2 Hybrid Search keyword_weight = 0.0, vector_weight = 1.0 For experiment-2, I took the inverse approach to further validate my hybrid search implementation. I set the keyword_weight to 0 and the vector_weight to 1, effectively giving full weightage to the vector search component (KNN search). I then compared these results with a pure vector search. The outcome was consistent with my expectations: the results from the hybrid search with these settings exactly matched those from a standalone vector search. This confirms that my implementation accurately reproduces vector search results when configured to do so. As with the first experiment, I observed that while the results were identical, the scores differed between the two methods. This difference in scoring is expected and can be attributed to the normalization process in hybrid search, which still considers both components even when one is given zero weight. This experiment further validates the accuracy and flexibility of my hybrid search solution, demonstrating its ability to effectively emulate pure vector search when needed while maintaining the underlying hybrid search structure. Experiment - 3 Hybrid Search - balanced keyword_weight = 0.5, vector_weight = 0.5 For experiment-3, I adopted a balanced approach to further evaluate the effectiveness of my hybrid search implementation. In this test, I set both the keyword_weight and vector_weight to 0.5, giving equal importance to keyword-based and vector-based search components. This configuration aims to leverage the strengths of both search methods simultaneously. By setting both weights to 0.5, I intended to create a scenario where the hybrid search would consider lexical matches and semantic similarity equally. This balanced approach is often ideal for many real-world applications, as it can capture both exact keyword matches and contextually relevant results that might not contain the exact search terms. Kindly verify the notebook for the experiments conducted! Notebook: https://github.com/karthikbharadhwajKB/Langchain_OpenSearch_Hybrid_search/blob/main/Opensearch_Hybridsearch.ipynb ### Instructions to follow for Performing Hybrid Search: Step-1: Instantiating OpenSearchVectorSearch Class: ```python opensearch_vectorstore = OpenSearchVectorSearch( index_name=os.getenv("INDEX_NAME"), embedding_function=embedding_model, opensearch_url=os.getenv("OPENSEARCH_URL"), http_auth=(os.getenv("OPENSEARCH_USERNAME"),os.getenv("OPENSEARCH_PASSWORD")), use_ssl=False, verify_certs=False, ssl_assert_hostname=False, ssl_show_warn=False ) ``` Parameters: 1. index_name: The name of the OpenSearch index to use. 2. embedding_function: The function or model used to generate embeddings for the documents. It's assumed that embedding_model is defined elsewhere in the code. 3. opensearch_url: The URL of the OpenSearch instance. 4. http_auth: A tuple containing the username and password for authentication. 5. use_ssl: Set to False, indicating that the connection to OpenSearch is not using SSL/TLS encryption. 6. verify_certs: Set to False, which means the SSL certificates are not being verified. This is often used in development environments but is not recommended for production. 7. ssl_assert_hostname: Set to False, disabling hostname verification in SSL certificates. 8. ssl_show_warn: Set to False, suppressing SSL-related warnings. Step-2: Configure Search Pipeline: To initiate hybrid search functionality, you need to configures a search pipeline first. Implementation Details: This method configures a search pipeline in OpenSearch that: 1. Normalizes the scores from both keyword and vector searches using the min-max technique. 2. Applies the specified weights to the normalized scores. 3. Calculates the final score using an arithmetic mean of the weighted, normalized scores. Parameters: * pipeline_name (str): A unique identifier for the search pipeline. It's recommended to use a descriptive name that indicates the weights used for keyword and vector searches. * keyword_weight (float): The weight assigned to the keyword search component. This should be a float value between 0 and 1. In this example, 0.3 gives 30% importance to traditional text matching. * vector_weight (float): The weight assigned to the vector search component. This should be a float value between 0 and 1. In this example, 0.7 gives 70% importance to semantic similarity. ```python opensearch_vectorstore.configure_search_pipelines( pipeline_name="search_pipeline_keyword_0.3_vector_0.7", keyword_weight=0.3, vector_weight=0.7, ) ``` Step-3: Performing Hybrid Search: After creating the search pipeline, you can perform a hybrid search using the `similarity_search()` method (or) any methods that are supported by `langchain`. This method combines both `keyword-based and semantic similarity` searches on your OpenSearch index, leveraging the strengths of both traditional information retrieval and vector embedding techniques. parameters: * query: The search query string. * k: The number of top results to return (in this case, 3). * search_type: Set to `hybrid_search` to use both keyword and vector search capabilities. * search_pipeline: The name of the previously created search pipeline. ```python query = "what are the country named in our database?" top_k = 3 pipeline_name = "search_pipeline_keyword_0.3_vector_0.7" matched_docs = opensearch_vectorstore.similarity_search_with_score( query=query, k=top_k, search_type="hybrid_search", search_pipeline = pipeline_name ) matched_docs ``` twitter handle: @iamkarthik98 --------- Co-authored-by: Karthik Kolluri <karthik.kolluri@eidosmedia.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-12-13 16:34:12 -05:00
Philippe PRADOS	f3fb5a9c68	community[minor]: Fix json._validate_metadata_func() (#22842 ) JSONparse, in _validate_metadata_func(), checks the consistency of the _metadata_func() function. To do this, it invokes it and makes sure it receives a dictionary in response. However, during the call, it does not respect future calls, as shown on line 100. This generates errors if, for example, the function is like this: ```python def generate_metadata(json_node:Dict[str,Any],kwargs:Dict[str,Any]) -> Dict[str,Any]: return { "source": url, "row": kwargs['seq_num'], "question":json_node.get("question"), } loader = JSONLoader( file_path=file_path, content_key="answer", jq_schema='.[]', metadata_func=generate_metadata, text_content=False) ``` To avoid this, the verification must comply with the specifications. This patch does just that. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-12-13 21:24:20 +00:00
Keiichi Hirobe	67fd554512	core[patch]: throw exception indexing code if deletion fails in vectorstore (#28103 ) The delete methods in the VectorStore and DocumentIndex interfaces return a status indicating the result. Therefore, we can assume that their implementations don't throw exceptions but instead return a result indicating whether the delete operations have failed. The current implementation doesn't check the returned value, so I modified it to throw an exception when the operation fails. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-12-13 16:14:27 -05:00
Keiichi Hirobe	258b3be5ec	core[minor]: add new clean up strategy "scoped_full" to indexing (#28505 ) ~Note that this PR is now Draft, so I didn't add change to `aindex` function and didn't add test codes for my change. After we have an agreement on the direction, I will add commits.~ `batch_size` is very difficult to decide because setting a large number like >10000 will impact VectorDB and RecordManager, while setting a small number will delete records unnecessarily, leading to redundant work, as the `IMPORTANT` section says. On the other hand, we can't use `full` because the loader returns just a subset of the dataset in our use case. I guess many people are in the same situation as us. So, as one of the possible solutions for it, I would like to introduce a new argument, `scoped_full_cleanup`. This argument will be valid only when `claneup` is Full. If True, Full cleanup deletes all documents that haven't been updated AND that are associated with source ids that were seen during indexing. Default is False. This change keeps backward compatibility. --------- Co-authored-by: Eugene Yurtsev <eugene@langchain.dev> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-12-13 20:35:25 +00:00
Eugene Yurtsev	ce90b25313	core[patch]: Update error message in indexing code for unreachable code assertion (#28712 ) Minor update for error message that should never be triggered	2024-12-13 20:21:14 +00:00
Keiichi Hirobe	da28cf1f54	core[patch]: Reverts PR #25754 and add unit tests (#28702 ) I reported the bug 2 weeks ago here: https://github.com/langchain-ai/langchain/issues/28447 I believe this is a critical bug for the indexer, so I submitted a PR to revert the change and added unit tests to prevent similar bugs from being introduced in the future. @eyurtsev Could you check this?	2024-12-13 15:13:06 -05:00
ScriptShi	b0a298894d	community[minor]: Add TablestoreVectorStore (#25767 ) Thank you for contributing to LangChain! - [x] PR title: community: add TablestoreVectorStore - [x] PR message: - Description: add TablestoreVectorStore - Dependencies: none - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration: yes 2. an example notebook showing its use: yes If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-12-13 11:17:28 -08:00
Erick Friis	86b3c6e81c	community: make old stub for QuerySQLDataBaseTool private to skip api ref (#28711 )	2024-12-13 10:43:23 -08:00
Martin Triska	05ebe1e66b	Community: add `modified_since` argument to `O365BaseLoader` (#28708 ) ## What are we doing in this PR We're adding `modified_since` optional argument to `O365BaseLoader`. When set, O365 loader will only load documents newer than `modified_since` datetime. ## Why? OneDrives / Sharepoints can contain large number of documents. Current approach is to download and parse all files and let indexer to deal with duplicates. This can be prohibitively time-consuming. Especially when using OCR-based parser like [zerox](`fa06188834/libs/community/langchain_community/document_loaders/pdf.py (L948)`). This argument allows to skip documents that are older than known time of indexing. _Q: What if a file was modfied during last indexing process? A: Users can set the `modified_since` conservatively and indexer will still take care of duplicates._ If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-13 17:30:17 +00:00
Bagatur	fa06188834	community[patch]: fix QuerySQLDatabaseTool name (#28659 ) Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-12 19:16:03 -08:00
Erick Friis	48ab91b520	docs: more useful vercel warnings (#28699 )	2024-12-13 03:07:24 +00:00
Michael Chin	28cb2cefc6	docs: Fix stack diagram in community README (#28685 ) - Description: The stack diagram illustration in the community README fails to render due to an invalid branch reference. This PR replaces the broken image link with a valid one referencing master branch.	2024-12-12 13:33:50 -08:00
Botong Zhu	13c3c4a210	community: fixes json loader not getting texts with json standard (#27327 ) This PR fixes JSONLoader._get_text not converting objects to json string correctly. If an object is serializable and is not a dict, JSONLoader will use python built-in str() method to convert it to string. This may cause object converted to strings not following json standard. For example, a list will be converted to string with single quotes, and if json.loads try to load this string, it will cause error. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-12 19:33:45 +00:00
Lorenzo	4149c0dd8d	community: add method to create branch and list files for gitlab tool (#27883 ) ### About - Description: In the Gitlab utilities used for the Gitlab tool there are no methods to create branches, list branches and files, as this is already done for Github - Issue: None - Dependencies: None This Pull request add the methods: - create_branch - list_branches_in_repo - set_active_branch - list_files_in_main_branch - list_files_in_bot_branch - list_files_from_directory --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-12 19:11:35 +00:00
Prathamesh Nimkar	ca054ed1b1	community: ChatSnowflakeCortex - Add streaming functionality (#27753 ) Description: snowflake.py Add _stream and _stream_content methods to enable streaming functionality fix pydantic issues and added functionality with the overall langchain version upgrade added bind_tools method for agentic workflows support through langgraph updated the _generate method to account for agentic workflows support through langgraph cosmetic changes to comments and if conditions snowflake.ipynb Added _stream example cosmetic changes to comments fixed lint errors check_pydantic.sh Decreased counter from 126 to 125 as suggested when formatting --------- Co-authored-by: Prathamesh Nimkar <prathamesh.nimkar@snowflake.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-11 18:35:40 -08:00
Wang, Yi	d834c6b618	huggingface: fix tool argument serialization in _convert_TGI_message_to_LC_message (#26075 ) Currently `_convert_TGI_message_to_LC_message` replaces `'` in the tool arguments, so an argument like "It's" will be converted to `It"s` and could cause a json parser to fail. --------- Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Vadym Barda <vadym@langchain.dev>	2024-12-11 18:34:32 -08:00
Lakindu Boteju	5a31792bf1	community: Add support for cross-region inference profile IDs in Bedrock Anthropic Claude token cost calculation (#28167 ) This change modifies the token cost calculation logic to support cross-region inference profile IDs for Anthropic Claude models. Instead of explicitly listing all regional variants of new inference profile IDs in the cost dictionaries, the code now extracts a base model ID from the input model ID (or inference profile ID), making it more maintainable and automatically supporting new regional variants. These inference profile IDs follow the format: `<region>.<vendor>.<model-name>` (e.g., `us.anthropic.claude-3-haiku-xxx`, `eu.anthropic.claude-3-sonnet-xxx`). Cross-region inference profiles are system-defined identifiers that enable distributing model inference requests across multiple AWS regions. They help manage unplanned traffic bursts and enhance resilience during peak demands without additional routing costs. References for Amazon Bedrock's cross-region inference profiles:- - https://docs.aws.amazon.com/bedrock/latest/userguide/cross-region-inference.html - https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles-support.html --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-12 02:33:50 +00:00
fatmelon	d1e0ec7b55	community: VectorStores: Azure Cosmos DB Mongo vCore with DiskANN (#27329 ) # Description Add a new vector index type `diskann` to Azure Cosmos DB Mongo vCore vector store. Paper of DiskANN can be found here [DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node](https://proceedings.neurips.cc/paper_files/paper/2019/file/09853c7fb1d3f8ee67a61b6bf4a7f8e6-Paper.pdf). ## Sample Usage ```python from pymongo import MongoClient # INDEX_NAME = "izzy-test-index-2" # NAMESPACE = "izzy_test_db.izzy_test_collection" # DB_NAME, COLLECTION_NAME = NAMESPACE.split(".") client: MongoClient = MongoClient(CONNECTION_STRING) collection = client[DB_NAME][COLLECTION_NAME] model_deployment = os.getenv( "OPENAI_EMBEDDINGS_DEPLOYMENT", "smart-agent-embedding-ada" ) model_name = os.getenv("OPENAI_EMBEDDINGS_MODEL_NAME", "text-embedding-ada-002") vectorstore = AzureCosmosDBVectorSearch.from_documents( docs, openai_embeddings, collection=collection, index_name=INDEX_NAME, ) # Read more about these variables in detail here. https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb/vcore/vector-search maxDegree = 40 dimensions = 1536 similarity_algorithm = CosmosDBSimilarityType.COS kind = CosmosDBVectorSearchType.VECTOR_DISKANN lBuild = 20 vectorstore.create_index( dimensions=dimensions, similarity=similarity_algorithm, kind=kind , max_degree=maxDegree, l_build=lBuild, ) ``` ## Dependencies No additional dependencies were added --------- Co-authored-by: Yang Qiao (from Dev Box) <yangqiao@microsoft.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-12 01:54:04 +00:00
manukychen	ba9b95cd23	Community: Adding bulk_size as a setable param for OpenSearchVectorSearch (#28325 ) Description: When using langchain.retrievers.parent_document_retriever.py with vectorstore is OpenSearchVectorSearch, I found that the bulk_size param I passed into OpenSearchVectorSearch class did not work on my ParentDocumentRetriever.add_documents() function correctly, it will be overwrite with int 500 the function which OpenSearchVectorSearch class had (e.g., add_texts(), add_embeddings()...). So I made this PR requset to fix this, thanks! --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-12 01:45:22 +00:00
xintoteai	45f9c9ae88	langchain: fixed weaviate (v4) vectorstore import for self-query retriever (#28675 ) Co-authored-by: Xin Heng <xin.heng@gmail.com>	2024-12-11 15:53:41 -08:00
Thomas van Dongen	ee640d6bd3	community: fixed bug in model2vec embedding code (#28670 ) This PR fixes a bug with the current implementation for Model2Vec embeddings where `embed_documents` does not work as expected. - Description: the current implementation uses `encode_as_sequence` for encoding documents. This is incorrect, as `encode_as_sequence` creates token embeddings and not mean embeddings. The normal `encode` function handles both single and batched inputs and should be used instead. The return type was also incorrect, as encode returns a NumPy array. This PR converts the embedding to a list so that the output is consistent with the Embeddings ABC.	2024-12-11 15:50:56 -08:00
Brian Sharon	b20230c800	community: use correct `id_key` when deleting by id in LanceDB wrapper (#28655 ) - Description: The current version of the `delete` method assumes that the id field will always be called `id`. - Issue: n/a - Dependencies: n/a - Twitter handle: ugh, Twitter :D --- Thank you for contributing to LangChain! - [x] PR title: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [x] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-11 23:49:35 +00:00
Mohammad Mohtashim	fa155a422f	[Community]: `requests_kwargs` not being used in _fetch (#28646 ) - Description: `requests_kwargs` is not being passed to `_fetch` which is fetching pages asynchronously. In this PR, making sure that we are passing `requests_kwargs` to `_fetch` just like `_scrape`. - Issue: #28634 --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-11 23:46:54 +00:00
Mohammad Mohtashim	a37afbe353	mistral[minor]: Added Retrying Mechanism in case of Request Rate Limit Error for `MistralAIEmbeddings` (#27818 ) - Description:: In the event of a Rate Limit Error from the MistralAI server, the response JSON raises a KeyError. To address this, a simple retry mechanism has been implemented to handle cases where the request limit is exceeded. - Issue: #27790 --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-12-11 17:53:42 -05:00
Vincent Zhang	df5008fe55	community[minor]: FAISS Filter Function Enhancement with Advanced Query Operators (#28207 ) ## Description We are submitting as a team of four for a project. Other team members are @RuofanChen03, @LikeWang10067, @TANYAL77. This pull requests expands the filtering capabilities of the FAISS vectorstore by adding MongoDB-style query operators indicated as follows, while including comprehensive testing for the added functionality. - $eq (equals) - $neq (not equals) - $gt (greater than) - $lt (less than) - $gte (greater than or equal) - $lte (less than or equal) - $in (membership in list) - $nin (not in list) - $and (all conditions must match) - $or (any condition must match) - $not (negation of condition) ## Issue This closes https://github.com/langchain-ai/langchain/issues/26379. ## Sample Usage ```python import faiss import asyncio from langchain_community.vectorstores import FAISS from langchain.schema import Document from langchain_huggingface import HuggingFaceEmbeddings embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2") documents = [ Document(page_content="Process customer refund request", metadata={"schema_type": "financial", "handler_type": "refund",}), Document(page_content="Update customer shipping address", metadata={"schema_type": "customer", "handler_type": "update",}), Document(page_content="Process payment transaction", metadata={"schema_type": "financial", "handler_type": "payment",}), Document(page_content="Handle customer complaint", metadata={"schema_type": "customer","handler_type": "complaint",}), Document(page_content="Process invoice payment", metadata={"schema_type": "financial","handler_type": "payment",}) ] async def search(vectorstore, query, schema_type, handler_type, k=2): schema_filter = {"schema_type": {"$eq": schema_type}} handler_filter = {"handler_type": {"$eq": handler_type}} combined_filter = { "$and": [ schema_filter, handler_filter, ] } base_retriever = vectorstore.as_retriever( search_kwargs={"k":k, "filter":combined_filter} ) return await base_retriever.ainvoke(query) async def main(): vectorstore = FAISS.from_texts( texts=[doc.page_content for doc in documents], embedding=embeddings, metadatas=[doc.metadata for doc in documents] ) def printt(title, documents): print(title) if not documents: print("\tNo documents found.") return for doc in documents: print(f"\t{doc.page_content}. {doc.metadata}") printt("Documents:", documents) printt('\nquery="process payment", schema_type="financial", handler_type="payment":', await search(vectorstore, query="process payment", schema_type="financial", handler_type="payment", k=2)) printt('\nquery="customer update", schema_type="customer", handler_type="update":', await search(vectorstore, query="customer update", schema_type="customer", handler_type="update", k=2)) printt('\nquery="refund process", schema_type="financial", handler_type="refund":', await search(vectorstore, query="refund process", schema_type="financial", handler_type="refund", k=2)) printt('\nquery="refund process", schema_type="financial", handler_type="foobar":', await search(vectorstore, query="refund process", schema_type="financial", handler_type="foobar", k=2)) print() if __name__ == "__main__":asyncio.run(main()) ``` ## Output ``` Documents: Process customer refund request. {'schema_type': 'financial', 'handler_type': 'refund'} Update customer shipping address. {'schema_type': 'customer', 'handler_type': 'update'} Process payment transaction. {'schema_type': 'financial', 'handler_type': 'payment'} Handle customer complaint. {'schema_type': 'customer', 'handler_type': 'complaint'} Process invoice payment. {'schema_type': 'financial', 'handler_type': 'payment'} query="process payment", schema_type="financial", handler_type="payment": Process payment transaction. {'schema_type': 'financial', 'handler_type': 'payment'} Process invoice payment. {'schema_type': 'financial', 'handler_type': 'payment'} query="customer update", schema_type="customer", handler_type="update": Update customer shipping address. {'schema_type': 'customer', 'handler_type': 'update'} query="refund process", schema_type="financial", handler_type="refund": Process customer refund request. {'schema_type': 'financial', 'handler_type': 'refund'} query="refund process", schema_type="financial", handler_type="foobar": No documents found. ``` --------- Co-authored-by: ruofan chen <ruofan.is.awesome@gmail.com> Co-authored-by: RickyCowboy <like.wang@mail.utoronto.ca> Co-authored-by: Shanni Li <tanya.li@mail.utoronto.ca> Co-authored-by: RuofanChen03 <114096642+ruofanchen03@users.noreply.github.com> Co-authored-by: Like Wang <102838708+likewang10067@users.noreply.github.com>	2024-12-11 17:52:22 -05:00
like	3048a9a26d	community: tongyi multimodal response format fix to support langchain (#28645 ) Description: The multimodal(tongyi) response format "message": {"role": "assistant", "content": [{"text": "图像"}]}}]} is not compatible with LangChain. Dependencies: No --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-10 21:13:26 +00:00
Bagatur	d0e662e43b	community[patch]: Release 0.3.11 (#28658 )	2024-12-10 20:51:13 +00:00
Bagatur	91227ad7fd	langchain[patch]: Release 0.3.11 (#28657 )	2024-12-10 12:28:14 -08:00
Bagatur	1fbd86a155	core[patch]: Release 0.3.24 (#28656 )	2024-12-10 20:19:21 +00:00
Bagatur	e6a62d8422	core,langchain,community[patch]: allow langsmith 0.2 (#28598 )	2024-12-10 18:50:58 +00:00
ccurme	bc4dc7f4b1	ollama[patch]: permit streaming for tool calls (#28654 ) Resolves https://github.com/langchain-ai/langchain/issues/28543 Ollama recently [released](https://github.com/ollama/ollama/releases/tag/v0.4.6) support for streaming tool calls. Previously we would override the `stream` parameter if tools were passed in. Covered in standard tests here: `c1d348e95d/libs/standard-tests/langchain_tests/integration_tests/chat_models.py (L893-L897)` Before, the test generates one message chunk: ```python [ AIMessageChunk( content='', additional_kwargs={}, response_metadata={ 'model': 'llama3.1', 'created_at': '2024-12-10T17:49:04.468487Z', 'done': True, 'done_reason': 'stop', 'total_duration': 525471208, 'load_duration': 19701000, 'prompt_eval_count': 170, 'prompt_eval_duration': 31000000, 'eval_count': 17, 'eval_duration': 473000000, 'message': Message( role='assistant', content='', images=None, tool_calls=[ ToolCall( function=Function(name='magic_function', arguments={'input': 3}) ) ] ) }, id='run-552bbe0f-8fb2-4105-ada1-fa38c1db444d', tool_calls=[ { 'name': 'magic_function', 'args': {'input': 3}, 'id': 'b0a4dc07-7d7a-487b-bd7b-ad062c2363a2', 'type': 'tool_call', }, ], usage_metadata={ 'input_tokens': 170, 'output_tokens': 17, 'total_tokens': 187 }, tool_call_chunks=[ { 'name': 'magic_function', 'args': '{"input": 3}', 'id': 'b0a4dc07-7d7a-487b-bd7b-ad062c2363a2', 'index': None, 'type': 'tool_call_chunk', } ] ) ] ``` After, it generates two (tool call in one, response metadata in another): ```python [ AIMessageChunk( content='', additional_kwargs={}, response_metadata={}, id='run-9a3f0860-baa1-4bae-9562-13a61702de70', tool_calls=[ { 'name': 'magic_function', 'args': {'input': 3}, 'id': '5bbaee2d-c335-4709-8d67-0783c74bd2e0', 'type': 'tool_call', }, ], tool_call_chunks=[ { 'name': 'magic_function', 'args': '{"input": 3}', 'id': '5bbaee2d-c335-4709-8d67-0783c74bd2e0', 'index': None, 'type': 'tool_call_chunk', }, ], ), AIMessageChunk( content='', additional_kwargs={}, response_metadata={ 'model': 'llama3.1', 'created_at': '2024-12-10T17:46:43.278436Z', 'done': True, 'done_reason': 'stop', 'total_duration': 514282750, 'load_duration': 16894458, 'prompt_eval_count': 170, 'prompt_eval_duration': 31000000, 'eval_count': 17, 'eval_duration': 464000000, 'message': Message( role='assistant', content='', images=None, tool_calls=None ), }, id='run-9a3f0860-baa1-4bae-9562-13a61702de70', usage_metadata={ 'input_tokens': 170, 'output_tokens': 17, 'total_tokens': 187 } ), ] ```	2024-12-10 12:54:37 -05:00
Johannes Mohren	c1d348e95d	doc-loader: retain Azure Doc Intelligence API metadata in Document parser (#28382 ) Description: This PR modifies the doc_intelligence.py parser in the community package to include all metadata returned by the Azure Doc Intelligence API in the Document object. Previously, only the parsed content (markdown) was retained, while other important metadata such as bounding boxes (bboxes) for images and tables was discarded. These image bboxes are crucial for supporting use cases like multi-modal RAG workflows when using Azure Doc Intelligence. The change ensures that all information returned by the Azure Doc Intelligence API is preserved by setting the metadata attribute of the Document object to the entire result returned by the API, rather than an empty dictionary. This extends the parser's utility for complex use cases without breaking existing functionality. Issue: This change does not address a specific issue number, but it resolves a critical limitation in supporting multimodal workflows when using the LangChain wrapper for the Azure API. Dependencies: No additional dependencies are required for this change. --------- Co-authored-by: jmohren <johannes.mohren@aol.de>	2024-12-10 11:22:58 -05:00
Alex Tonkonozhenko	0d20c314dd	Confluence Loader: Fix CQL loading (#27620 ) fix #12082 <!--- If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. -->	2024-12-10 11:05:23 -05:00
Katarina Supe	aba2711e7f	community: update Memgraph integration (#27017 ) Description: - Memgraph no longer relies on `Neo4jGraphStore` but implements `GraphStore`, just like other graph databases. - Memgraph no longer relies on `GraphQAChain`, but implements `MemgraphQAChain`, just like other graph databases. - The refresh schema procedure has been updated to try using `SHOW SCHEMA INFO`. The fallback uses Cypher queries (a combination of schema and Cypher) → LangChain integration no longer relies on MAGE library. - The schema structure has been reformatted. Regardless of the procedures used to get schema, schema structure is the same. - The `add_graph_documents()` method has been implemented. It transforms `GraphDocument` into Cypher queries and creates a graph in Memgraph. It implements the ability to use `baseEntityLabel` to improve speed (`baseEntityLabel` has an index on the `id` property). It also implements the ability to include sources by creating a `MENTIONS` relationship to the source document. - Jupyter Notebook for Memgraph has been updated. - Issue: / - Dependencies: / - Twitter handle: supe_katarina (DX Engineer @ Memgraph) Closes #25606	2024-12-10 10:57:21 -05:00

... 10 11 12 13 14 ...

7272 Commits