Commit Graph

7272 Commits

Author SHA1 Message Date
ccurme
b78ae7817e
openai[patch]: trace strict in structured_output_kwargs (#30425) 2025-03-21 14:37:28 -04:00
ccurme
1de7fa8f3a
Revert "deepseek: temporarily bypass tests" (#30424)
Reverts langchain-ai/langchain#30423
2025-03-21 17:14:31 +00:00
ccurme
c74dfff836
deepseek: temporarily bypass tests (#30423)
Deepseek infra is not stable enough to get through integration tests.

Previous two attempts had two tests time out, they both pass locally.
2025-03-21 17:08:35 +00:00
ccurme
7147903724
deepseek: release 0.1.3 (#30422) 2025-03-21 16:39:50 +00:00
Andras L Ferenczi
b5f49df86a
partner: ChatDeepSeek on openrouter not returning reasoning (#30240)
Deepseek model does not return reasoning when hosted on openrouter
(Issue [30067](https://github.com/langchain-ai/langchain/issues/30067))

the following code did not return reasoning:

```python
llm = ChatDeepSeek( model = 'deepseek/deepseek-r1:nitro', api_base="https://openrouter.ai/api/v1", api_key=os.getenv("OPENROUTER_API_KEY")) 
messages = [
    {"role": "system", "content": "You are an assistant."},
    {"role": "user", "content": "9.11 and 9.8, which is greater? Explain the reasoning behind this decision."}
]
response = llm.invoke(messages, extra_body={"include_reasoning": True})
print(response.content)
print(f"REASONING: {response.additional_kwargs.get('reasoning_content', '')}")
print(response)
```

The fix is to extract reasoning from
response.choices[0].message["model_extra"] and from
choices[0].delta["reasoning"]. and place in response additional_kwargs.
Change is really just the addition of a couple one-sentence if
statements.

---------

Co-authored-by: andrasfe <andrasf94@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-03-21 16:35:37 +00:00
Vadym Barda
4852ab8d0a
core[patch]: more tests for trim_messages (#30421) 2025-03-21 16:19:52 +00:00
ccurme
e8e3b2bfae
ollama: release 0.3.0 (#30420) 2025-03-21 15:50:08 +00:00
Bob Merkus
5700646cc5
ollama: add reasoning model support (e.g. deepseek) (#29689)
# Description
This PR adds reasoning model support for `langchain-ollama` by
extracting reasoning token blocks, like those used in deepseek. It was
inspired by
[ollama-deep-researcher](https://github.com/langchain-ai/ollama-deep-researcher),
specifically the parsing of [thinking
blocks](6d1aaf2139/src/assistant/graph.py (L91)):
```python
  # TODO: This is a hack to remove the <think> tags w/ Deepseek models 
  # It appears very challenging to prompt them out of the responses 
  while "<think>" in running_summary and "</think>" in running_summary:
      start = running_summary.find("<think>")
      end = running_summary.find("</think>") + len("</think>")
      running_summary = running_summary[:start] + running_summary[end:]
```

This notes that it is very hard to remove the reasoning block from
prompting, but we actually want the model to reason in order to increase
model performance. This implementation extracts the thinking block, so
the client can still expect a proper message to be returned by
`ChatOllama` (and use the reasoning content separately when desired).

This implementation takes the same approach as
[ChatDeepseek](5d581ba22c/libs/partners/deepseek/langchain_deepseek/chat_models.py (L215)),
which adds the reasoning content to
chunk.additional_kwargs.reasoning_content;
```python
  if hasattr(response.choices[0].message, "reasoning_content"):  # type: ignore
      rtn.generations[0].message.additional_kwargs["reasoning_content"] = (
          response.choices[0].message.reasoning_content  # type: ignore
      )
```

This should probably be handled upstream in ollama + ollama-python, but
this seems like a reasonably effective solution. This is a standalone
example of what is happening;

```python
async def deepseek_message_astream(
    llm: BaseChatModel,
    messages: list[BaseMessage],
    config: RunnableConfig | None = None,
    *,
    model_target: str = "deepseek-r1",
    **kwargs: Any,
) -> AsyncIterator[BaseMessageChunk]:
    """Stream responses from Deepseek models, filtering out <think> tags.

    Args:
        llm: The language model to stream from
        messages: The messages to send to the model

    Yields:
        Filtered chunks from the model response
    """
    # check if the model is deepseek based
    if (llm.name and model_target not in llm.name) or (hasattr(llm, "model") and model_target not in llm.model):
        async for chunk in llm.astream(messages, config=config, **kwargs):
            yield chunk
        return

    # Yield with a buffer, upon completing the <think></think> tags, move them to the reasoning content and start over
    buffer = ""
    async for chunk in llm.astream(messages, config=config, **kwargs):
        # start or append
        if not buffer:
            buffer = chunk.content
        else:
            buffer += chunk.content if hasattr(chunk, "content") else chunk

        # Process buffer to remove <think> tags
        if "<think>" in buffer or "</think>" in buffer:
            if hasattr(chunk, "tool_calls") and chunk.tool_calls:
                raise NotImplementedError("tool calls during reasoning should be removed?")
            if "<think>" in chunk.content or "</think>" in chunk.content:
                continue
            chunk.additional_kwargs["reasoning_content"] = chunk.content
            chunk.content = ""
        # upon block completion, reset the buffer
        if "<think>" in buffer and "</think>" in buffer:
            buffer = ""
        yield chunk

```

# Issue
Integrating reasoning models (e.g. deepseek-r1) into existing LangChain
based workflows is hard due to the thinking blocks that are included in
the message contents. To avoid this, we could match the `ChatOllama`
integration with `ChatDeepseek` to return the reasoning content inside
`message.additional_arguments.reasoning_content` instead.

# Dependenices
None

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-03-21 15:44:54 +00:00
ccurme
d8145dda95
xai: release 0.2.2 (#30403) 2025-03-20 20:25:16 +00:00
ccurme
e194902994
mistral: release 0.2.9 (#30402) 2025-03-20 20:22:24 +00:00
ccurme
49466ec9ca
groq: release 0.3.1 (#30401) 2025-03-20 20:19:49 +00:00
ccurme
db1e340387
fireworks: release 0.2.8 (#30400) 2025-03-20 16:15:51 -04:00
ccurme
785a8e7d45
tests: release 0.3.15 (#30397) 2025-03-20 15:38:40 -04:00
ccurme
5588ca4cfb
core: release 0.3.47 (#30396) 2025-03-20 18:52:53 +00:00
ccurme
de3960d285
multiple: enforce standards on tool_choice (#30372)
- Test if models support forcing tool calls via `tool_choice`. If they
do, they should support
  - `"any"` to specify any tool
  - the tool name as a string to force calling a particular tool
- Add `tool_choice` to signature of `BaseChatModel.bind_tools` in core
- Deprecate `tool_choice_value` in standard tests in favor of a boolean
`has_tool_choice`

Will follow up with PRs in external repos (tested in AWS and Google
already).
2025-03-20 17:48:59 +00:00
ccurme
b86cd8270c
multiple: support strict and method in with_structured_output (#30385) 2025-03-20 13:17:07 -04:00
Mohammad Mohtashim
1103bdfaf1
(Ollama) Fix String Value parsing in _parse_arguments_from_tool_call (#30154)
- **Description:** Fix String Value parsing in
_parse_arguments_from_tool_call
- **Issue:** #30145

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-03-19 21:47:18 -04:00
Tim König
b5992695ae
community: add ZoteroRetriever (#30270)
**Description** 
This contribution adds a retriever for the Zotero API.
[Zotero](https://www.zotero.org/) is an open source reference management
for bibliographic data and related research materials. A retriever will
allow langchain applications to retrieve relevant documents from
personal or shared group libraries, which I believe will be helpful for
numerous applications, such as RAG systems, personal research
assistants, etc. Tests and docs were added.

The documentation provided assumes the retriever will be part of the
langchain-community package, as this seemed customary. Please let me
know if this is not the preferred way to do it. I also uploaded the
implementation to PyPI.

**Dependencies**
The retriever requires the `pyzotero` package for API access. This
dependency is stated in the docs, and the retriever will return an error
if the package is not found. However, this dependency is not added to
the langchain package itself.

**Twitter handle**
I'm no longer using Twitter, but I'd appreciate a shoutout on
[Bluesky](https://bsky.app/profile/koenigt.bsky.social) or
[LinkedIn](https://www.linkedin.com/in/dr-tim-k%C3%B6nig-534aa2324/)!


Let me know if there are any issues, I'll gladly try and sort them out!

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-03-19 20:19:32 -04:00
pulvedu
4346aca5cf
Integration update (#30381)
This pull request includes a change to the following
- docs/docs/integrations/tools/tavily_search.ipynb 
- docs/docs/integrations/tools/tavily_extract.ipynb
- added docs/docs/integrations/providers/tavily.mdx

---------

Co-authored-by: pulvedu <dustin@tavily.com>
2025-03-19 17:58:25 -04:00
Daniel Rauber
9b687d7fbd
community[minor]: PlaywrightURLLoader can take stored session file (#30152)
**Description:**
Implements an additional `browser_session` parameter on
PlaywrightURLLoader which can be used to initialize the browser context
by providing a stored playwright context.
2025-03-19 16:29:07 -04:00
Vadym Barda
73c04f4707
core[patch]: release 0.3.46 (#30383) 2025-03-19 15:09:08 -04:00
William FH
ce84f8ba7e
Dereference run tree (#30377) 2025-03-19 19:05:06 +00:00
William FH
8265be4d3e
Unset context to None in var (#30380) 2025-03-19 18:53:17 +00:00
William FH
4130e6476b
Unset context after step (#30378)
While we are already careful to copy before setting the config, if other
objects hold a reference to the config or context, it wouldn't be
cleared.
2025-03-19 11:46:23 -07:00
Vadym Barda
37190881d3
core[patch]: add util for approximate token counting (#30373) 2025-03-19 17:48:38 +00:00
Matthew Farrellee
5f812f5968
langchain-tests: skip instead of passing image message tests (#30375)
**Description:** use skip for image message tests
2025-03-19 15:35:32 +00:00
ccurme
aae8306d6c
groq: release 0.3.0 (#30374) 2025-03-19 15:23:30 +00:00
Ashwin
83cfb9691f
Fix typo: change 'ben' to 'be' in comment (#30358)
**Description:**  
This PR fixes a minor typo in the comments within
`libs/partners/openai/langchain_openai/chat_models/base.py`. The word
"ben" has been corrected to "be" for clarity and professionalism.

**Issue:**  
N/A

**Dependencies:**  
None
2025-03-19 10:35:35 -04:00
Florian Chappaz
07cb41ea9e
community: aligning ChatLiteLLM default parameters with litellm (#30360)
**Description:**
Since `ChatLiteLLM` is forwarding most parameters to
`litellm.completion(...)`, there is no reason to set other default
values than the ones defined by `litellm`.

In the case of parameter 'n', it also provokes an issue when trying to
call a serverless endpoint on Azure, as it is considered an extra
parameter. So we need to keep it optional.

We can debate about backward compatibility of this change: in my
opinion, there should not be big issues since from my experience,
calling `litellm.completion()` without these parameters works fine.

**Issue:** 
- #29679 

**Dependencies:** None
2025-03-19 09:07:28 -04:00
Hodory
57ffacadd0
community: add keep_newlines parameter to process_pages method (#30365)
- **Description:** Adding keep_newlines parameter to process_pages
method with page_ids on Confluence document loader
- **Issue:** N/A (This is an enhancement rather than a bug fix)
- **Dependencies:** N/A
- **Twitter handle:** N/A
2025-03-19 08:57:59 -04:00
William FH
f5a0092551
Rm test for parent_run presence (#30356) 2025-03-18 19:44:19 -07:00
Adam Brenner
f949d9a3d3
docs: Add Dell PowerScale Document Loader (#30209)
# Description
Adds documentation on LangChain website for a Dell specific document
loader for on-prem storage devices. Additional details on what the
document loader is described in the PR as well as on our github repo:
[https://github.com/dell/powerscale-rag-connector](https://github.com/dell/powerscale-rag-connector)

This PR also creates a category on the document loader webpage as no
existing category exists for on-prem. This follows the existing pattern
already established as the website has a category for cloud providers.

# Issue:
New release, no issue.

# Dependencies:

None

# Twitter handle:

DellTech

---------

Signed-off-by: Adam Brenner <adam@aeb.io>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-03-18 22:39:21 -04:00
ccurme
9fb0db6937
community: release 0.3.20 (#30354) 2025-03-18 21:57:12 +00:00
ccurme
168f1dfd93
langchain[patch]: update text-splitters min bound (#30352) 2025-03-18 20:53:43 +00:00
ccurme
f6cf2ce2ad
langchain[patch]: lock with latest text-splitters (#30350) 2025-03-18 19:29:11 +00:00
ccurme
2909b49045
langchain: release 0.3.21 (#30348) 2025-03-18 19:13:20 +00:00
ccurme
958f85d541
text-splitters: release 0.3.7 (#30347) 2025-03-18 19:11:37 +00:00
Lance Martin
46d6bf0330
ollama[minor]: update default method for structured output (#30273)
From function calling to Ollama's [dedicated structured output
feature](https://ollama.com/blog/structured-outputs).

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-03-18 12:44:22 -04:00
Marlene
ff8ce60dcc
Core: Adding Azure AI to Supported Chat Models (#30342)
- **Description:** I was testing out `init_chat` and saw that chat
models can now be inferred. Azure OpenAI is currently only supported but
we would like to add support for Azure AI which is a different package.
This PR edits the `base.py` file to add the chat implementation.
- I don't think this adds any additional dependencies 
- Will add a test and lint, but starting an initial draft PR. 

cc @santiagxf

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-03-18 11:53:20 -04:00
TheSongg
251551ccf1
doc: Implement langchain-xinference (#30296)
- [ ] **PR title**: Implement langchain-xinference

- [ ] **PR message**: 
Implement a standalone package for Xinference chat models and llm
models.

https://github.com/langchain-ai/langchain/issues/30045#issue-2887214214
2025-03-18 11:50:16 -04:00
wenmeng zhou
5a6e1254a7
support return reasoning content for models like qwq in dashscope (#30317)
Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"

here is an example
```python
from langchain_community.chat_models.tongyi import ChatTongyi
from langchain_core.messages import HumanMessage

chatLLM = ChatTongyi(
    model="qwq-32b",   # refer to  https://help.aliyun.com/zh/model-studio/getting-started/models for more models
)
res = chatLLM.stream([HumanMessage(content="how much is 1 plus 1")])
for r in res:
    print(r)
```

```shell
content='' additional_kwargs={'reasoning_content': 'Okay, so the'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' user is asking "'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': 'how much is 1 plus'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' 1." Let me think'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' about this. Hmm'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ', 1 plus'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': " 1... That's a pretty"} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' basic math question. I'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' remember from arithmetic that when'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' you add 1 and'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' 1 together, the'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' result is 2.'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' But wait, maybe'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' I should double-check to be'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' sure. Let me visualize it'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': '. If I have one apple'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' and someone gives me another'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' apple, I have'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' two apples total. Yeah,'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' that makes sense. Or'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' on a number line'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ', starting at 1 and'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' moving 1 step forward lands'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' you at 2'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': '. \n\nIs there any'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' context where 1 +'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' 1 might not equal'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' 2? Like in different'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' number bases? Let'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': "'s see. In base"} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' 10, which'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' is standard,'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' 1+1 is'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' 2. But if'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' we were in binary'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' (base 2'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': '), 1 +'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' 1 would be 1'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': '0. But the question'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': " doesn't specify a base,"} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' so I think the'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' default is base 10'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': '. \n\nAlternatively, could'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' this be a trick'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' question? Maybe they'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': "'re referring to something else"} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ', like in Boolean'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' algebra where 1 +'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' 1 might still'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' be 1 in'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' some contexts? Wait'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ', no, in Boolean'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' addition, 1'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' + 1 is typically'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': " 1 because it's logical"} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' OR. But the'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' question just says "1'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' plus 1," which is'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' more arithmetic than Boolean.'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' \n\nOr maybe in some other'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' mathematical structure like modular arithmetic?'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' For example, modulo'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' 2,'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' 1 + 1 is'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' 0. But again'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ', unless specified, it'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': "'s probably standard addition"} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': '. \n\nThe user might be'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' testing if I know basic'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' math, or maybe'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': " they're a student just"} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' starting out. Either way,'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' the straightforward answer is'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' 2. I should also'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': " consider if there's any cultural"} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' references or jokes where'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' 1 + 1 equals'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' something else, but I can'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': "'t think of any common"} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' ones. \n\nAlternatively'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ', in some contexts like'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' in chemistry,'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' 1 + 1 could refer'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' to mixing solutions, but that'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': "'s not standard. The question"} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' is pretty simple,'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' so I think the answer'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' is 2. To'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' be thorough, maybe mention'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' that in standard arithmetic it'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': "'s 2, but if"} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': " there's a different"} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' context, the answer'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' might vary. But since'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' no context is given'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ', 2 is the safest'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ' answer.'} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='The result' additional_kwargs={'reasoning_content': ''} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content=' of 1 plus' additional_kwargs={'reasoning_content': ''} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content=' 1 is **2**.' additional_kwargs={'reasoning_content': ''} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content=' \n\nIn standard arithmetic (base' additional_kwargs={'reasoning_content': ''} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content=' 10), adding' additional_kwargs={'reasoning_content': ''} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content=' 1 and 1 together' additional_kwargs={'reasoning_content': ''} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content=' yields 2. This is' additional_kwargs={'reasoning_content': ''} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content=' a fundamental mathematical principle. If' additional_kwargs={'reasoning_content': ''} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content=' the question involves a different context' additional_kwargs={'reasoning_content': ''} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content=' (e.g., binary' additional_kwargs={'reasoning_content': ''} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content=', modular arithmetic, or a' additional_kwargs={'reasoning_content': ''} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content=' metaphorical meaning), it' additional_kwargs={'reasoning_content': ''} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content=' would need clarification,' additional_kwargs={'reasoning_content': ''} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content=' but under typical circumstances, the' additional_kwargs={'reasoning_content': ''} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content=' answer is **2**.' additional_kwargs={'reasoning_content': ''} response_metadata={} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'
content='' additional_kwargs={'reasoning_content': ''} response_metadata={'finish_reason': 'stop', 'request_id': '4738c641-6bd8-9efc-a4fe-d929d4e62bef', 'token_usage': {'input_tokens': 16, 'output_tokens': 560, 'total_tokens': 576}} id='run-bd026918-16e5-429f-aa75-3ff7701e9f8d'

```

Co-authored-by: ccurme <chester.curme@gmail.com>
2025-03-18 11:43:10 -04:00
ccurme
b91daf06eb
groq[minor]: remove default model (#30341)
The default model for `ChatGroq`, `"mixtral-8x7b-32768"`, is being
retired on March 20, 2025. Here we remove the default, such that model
names must be explicitly specified (being explicit is a good practice
here, and avoids the need for breaking changes down the line). This
change will be released in a minor version bump to 0.3.

This follows https://github.com/langchain-ai/langchain/pull/30161
(released in version 0.2.5), where we began generating warnings to this
effect.

![Screenshot 2025-03-18 at 10 33
27 AM](https://github.com/user-attachments/assets/f1e4b302-c62a-43b0-aa86-eaf9271e86cb)
2025-03-18 10:50:34 -04:00
amuwall
f6a17fbc56
community: fix import exception too constrictive (#30218)
Fix this issue #30097
2025-03-17 22:09:02 -04:00
qonnop
036f00dc92
community: support in-memory data (Blob.from_data) in all audio parsers (#30262)
OpenAIWhisperParser, OpenAIWhisperParserLocal, YandexSTTParser do not
handle in-memory audio data (loaded via Blob.from_data) correctly. They
require Blob.path to be set and AudioSegment is always read from the
file system. In-memory data is handled correctly only for
FasterWhisperParser so far. I changed OpenAIWhisperParser,
OpenAIWhisperParserLocal, YandexSTTParser accordingly to match
FasterWhisperParser.
Thanks for reviewing the PR!

Co-authored-by: qonnop <qonnop@users.noreply.github.com>
2025-03-17 19:52:33 -04:00
Matthew Farrellee
1985aaf095
langchain-tests: allow subclasses to add addition, non-standard tests (#30204)
**description:** the ChatModel[Integration]Tests classes are powerful
and helpful, this change allows sub-classes to add additional tests.

for instance,

```
class TestChatMyServiceIntegration(ChatModelIntegrationTests):
    ...
    def test_myservice(self, model: BaseChatModel) -> None:
        ...
```

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2025-03-17 23:37:16 +00:00
Ben
789db7398b
text-splitters: Add JSFrameworkTextSplitter for Handling JavaScript Framework Code (#28972)
## Description
This pull request introduces a new text splitter,
`JSFrameworkTextSplitter`, to the Langchain library. The
`JSFrameworkTextSplitter` extends the `RecursiveCharacterTextSplitter`
to handle JavaScript framework code effectively, including React (JSX),
Vue, and Svelte. It identifies and utilizes framework-specific component
tags and syntax elements as splitting points, alongside standard
JavaScript syntax. This ensures that code is divided at natural
boundaries, enhancing the parsing and processing of JavaScript and
framework-specific code.

### Key Features
- Supports React (JSX), Vue, and Svelte frameworks.
- Identifies and uses framework-specific tags and syntax elements as
natural splitting points.
- Extends the existing `RecursiveCharacterTextSplitter` for seamless
integration.

## Issue
No specific issue addressed.

## Dependencies
No additional dependencies required.

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2025-03-17 23:32:33 +00:00
ccurme
5684653775
openai[patch]: release 0.3.9 (#30325) 2025-03-17 16:08:41 +00:00
ccurme
eb9b992aa6
openai[patch]: support additional Responses API features (#30322)
- Include response headers
- Max tokens
- Reasoning effort
- Fix bug with structured output / strict
- Fix bug with simultaneous tool calling + structured output
2025-03-17 12:02:21 -04:00
Bae-ChangHyun
d8510270ee
community: add 'extract' mode to FireCrawlLoader for structured data extraction (#30242)
**Description:** 
Added an 'extract' mode to FireCrawlLoader that enables structured data
extraction from web pages. This feature allows users to Extract
structured data from a single URLs, or entire websites using Large
Language Models (LLMs).
You can show more params and usage on [firecrawl
docs](https://docs.firecrawl.dev/features/extract-beta).
You can extract from only one url now.(it depends on firecrawl's extract
method)

**Dependencies:** 
No new dependencies required. Uses existing FireCrawl API capabilities.

---------

Co-authored-by: chbae <chbae@gcsc.co.kr>
Co-authored-by: ccurme <chester.curme@gmail.com>
2025-03-17 15:15:57 +00:00
qonnop
747efa16ec
community: fix CPU support for FasterWhisperParser (implicit compute type for WhisperModel) (#30263)
FasterWhisperParser fails on a machine without an NVIDIA GPU: "Requested
float16 compute type, but the target device or backend do not support
efficient float16 computation." This problem arises because the
WhisperModel is called with compute_type="float16", which works only for
NVIDIA GPU.

According to the [CTranslate2
docs](https://opennmt.net/CTranslate2/quantization.html#bit-floating-points-float16)
float16 is supported only on NVIDIA GPUs. Removing the compute_type
parameter solves the problem for CPUs. According to the [CTranslate2
docs](https://opennmt.net/CTranslate2/quantization.html#quantize-on-model-loading)
setting compute_type to "default" (standard when omitting the parameter)
uses the original compute type of the model or performs implicit
conversion for the specific computation device (GPU or CPU). I suggest
to remove compute_type="float16".

@hulitaitai you are the original author of the FasterWhisperParser - is
there a reason for setting the parameter to float16?

Thanks for reviewing the PR!

Co-authored-by: qonnop <qonnop@users.noreply.github.com>
2025-03-14 22:22:29 -04:00
ccurme
c74e7b997d
openai[patch]: support structured output via Responses API (#30265)
Also runs all standard tests using Responses API.
2025-03-14 15:14:23 -04:00
Priyansh Agrawal
f54f14b747
community: cube document loader - do not load non-public dimensions and measures (#30286)
Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"

- **Description:** Do not load non-public dimensions and measures
(public: false) with Cube semantic loader

- **Issue:** Currently, non-public dimensions and measures are loaded by
the Cube document loader which leads to downstream applications using
these which is not allowed by Cube.


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
2025-03-14 15:07:56 -04:00
Stavros Kontopoulos
ac22cde130
langchain_ollama: Support keep_alive in embeddings (#30251)
- Description: Adds support for keep_alive in Ollama Embeddings see
https://github.com/ollama/ollama/issues/6401.
Builds on top of of
https://github.com/langchain-ai/langchain/pull/29296. I have this use
case where I want to keep the embeddings model in cpu forever.
- Dependencies: no deps are being introduced.
- Issue: haven't created an issue yet.
2025-03-14 14:56:50 -04:00
homeffjy
2c99f12062
community[patch]: fix bilibili loader handling of multi-page content (#30283)
Previously the loader would only extract subtitles from the first page
of multi-page videos.
2025-03-14 14:53:03 -04:00
ccurme
d5d0134e7b
anthropic: release 0.3.10 (#30287) 2025-03-14 16:23:21 +00:00
ccurme
226f29bc96
anthropic: support built-in tools, improve docs (#30274)
- Support features from recent update:
https://www.anthropic.com/news/token-saving-updates (mostly adding
support for built-in tools in `bind_tools`
- Add documentation around prompt caching, token-efficient tool use, and
built-in tools.
2025-03-14 16:18:50 +00:00
Priyansh Agrawal
f27e2d7ce7
community: cube document loader - fix logging (#30285)
Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"

- **Description:** Fix bad log message on line#56 and replace f-string
logs with format specifiers

- **Issue:** Log messages such as this one
`INFO:langchain_community.document_loaders.cube_semantic:Loading
dimension values for: {dimension_name}...`

- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
2025-03-14 11:36:18 -04:00
ccurme
bbd4b36d76
mistralai[patch]: bump core (#30278) 2025-03-13 23:04:36 +00:00
ccurme
315bb17ef5
core: release 0.3.45 (#30277) 2025-03-13 22:44:23 +00:00
pulvedu
d0bfc7f820
community[fix] : Pass API_KEY as argument (#30272)
PR Title:
community: Fix Pass API_KEY as argument

PR Message:
Description:
This PR fixes validation error "Value error, Did not find
tavily_api_key, please add an environment variable `TAVILY_API_KEY`
which contains it, or pass `tavily_api_key` as a named parameter."

Dependencies:
No new dependencies introduced.

---------

Co-authored-by: pulvedu <dustin@tavily.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-03-13 22:19:31 +00:00
ccurme
733abcc884
mistral: release 0.2.8 (#30275) 2025-03-13 21:54:34 +00:00
Jacob Lee
e9c1765967
fix(core): Ignore missing secrets on deserialization (#30252) 2025-03-13 12:27:03 -07:00
ccurme
ebea5e014d
standard tests: test simple agent loop (#30268) 2025-03-13 16:34:12 +00:00
ccurme
cd1ea8e94d
openai[patch]: support Responses API (#30231)
Co-authored-by: Bagatur <baskaryan@gmail.com>
2025-03-12 12:25:46 -04:00
Jason Zhang
49bdd3b6fe
docs: Add AgentQL provider doc, tool/toolkit doc and documentloader doc (#30144)
- **Description:** Added AgentQL docs for the provider page, tools page
and documentloader page
- **Twitter handle:** @AgentQL

Repo:
https://github.com/tinyfish-io/agentql-integrations/tree/main/langchain
PyPI: https://pypi.org/project/langchain-agentql/

If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-03-11 21:57:40 -04:00
Vadym Barda
23fa70f328
core[patch]: release 0.3.44 (#30236) 2025-03-11 18:59:02 -04:00
Vadym Barda
c7842730ef
core[patch]: support single-node subgraphs and put subgraph nodes under the respective subgraphs (#30234) 2025-03-11 18:55:45 -04:00
ccurme
62c570dd77
standard-tests, openai: bump core (#30202) 2025-03-10 19:22:24 +00:00
ccurme
f896e701eb
deepseek: install local langchain-tests in test deps (#30198) 2025-03-10 16:58:17 +00:00
Hugh Gao
aa6dae4a5b
community: Remove the system message count limit for ChatTongyi. (#30192)
## Description
The models in DashScope support multiple SystemMessage. Here is the
[Doc](https://bailian.console.aliyun.com/model_experience_center/text#/model-market/detail/qwen-long?tabKey=sdk),
and the example code on the document page:
```python
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),  # 如果您没有配置环境变量,请在此处替换您的API-KEY
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",  # 填写DashScope服务base_url
)
# 初始化messages列表
completion = client.chat.completions.create(
    model="qwen-long",
    messages=[
        {'role': 'system', 'content': 'You are a helpful assistant.'},
        # 请将 'file-fe-xxx'替换为您实际对话场景所使用的 file-id。
        {'role': 'system', 'content': 'fileid://file-fe-xxx'},
        {'role': 'user', 'content': '这篇文章讲了什么?'}
    ],
    stream=True,
    stream_options={"include_usage": True}
)

full_content = ""
for chunk in completion:
    if chunk.choices and chunk.choices[0].delta.content:
        # 拼接输出内容
        full_content += chunk.choices[0].delta.content
        print(chunk.model_dump())

print({full_content})
```
Tip: The example code is for OpenAI, but the document said that it also
supports the DataScope API, and I tested it, and it works.
```
Is the Dashscope SDK invocation method compatible?

Yes, the Dashscope SDK remains compatible for model invocation. However, file uploads and file-ID retrieval are currently only supported via the OpenAI SDK. The file-ID obtained through this method is also compatible with Dashscope for model invocation.
```
2025-03-10 08:58:40 -04:00
ccurme
67aff1648b
community: Add OpenGradient integration (Toolkit) (#30190)
Commandeering https://github.com/langchain-ai/langchain/pull/30135

---------

Co-authored-by: kylexqian <kylexqian@gmail.com>
2025-03-09 18:08:07 -04:00
ccurme
b209d46eb3
mistral[patch]: set global ssl context (#30189) 2025-03-09 21:27:41 +00:00
Vijay Selvaraj
df459d0d5e
community: add Valthera integration (#30105)
```markdown
**Description:**  
This PR integrates Valthera into LangChain, introducing an framework designed to send highly personalized nudges by an LLM agent. This is modeled after Dr. BJ Fogg's Behavior Model. This integration includes:

- Custom data connectors for HubSpot, PostHog, and Snowflake.
- A unified data aggregator that consolidates user data.
- Scoring configurations to compute motivation and ability scores.
- A reasoning engine that determines the appropriate user action.
- A trigger generator to create personalized messages for user engagement.

**Issue:**  
N/A

**Dependencies:**  
N/A

**Twitter handle:**  
- `@vselvarajijay`

**Tests and Docs:**  
- `docs/docs/integrations/tools/valthera` 
- `https://github.com/valthera/langchain-valthera/tree/main/tests`

```

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-03-09 21:19:08 +00:00
ccurme
3823daa0b9
cli: update integration doc template for tools (#30188)
Chain example -> langgraph agent
2025-03-09 21:14:43 +00:00
Jonathan Feng
911accf733
docs: add contextualai documentation (#30050)
Thank you for contributing to LangChain!
 
**Description:** adds ContextualAI's `langchain-contextual` package's
documentation

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-03-09 02:43:13 +00:00
Bharat
b9746a6910
fixes#30182: update tool names to match OpenAI function name pattern (#30183)
The OpenAI API requires function names to match the pattern
'^[a-zA-Z0-9_-]+$'. This updates the JIRA toolkit's tool names to use
underscores instead of spaces to comply with this requirement and
prevent BadRequestError when using the tools with OpenAI functions.

Error fixed:
```
File "langgraph-bug-fix/.venv/lib/python3.13/site-packages/openai/_base_client.py", line 1023, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'error': {'message': "Invalid 'tools[0].function.name': string does not match pattern. Expected a string that matches the pattern '^[a-zA-Z0-9_-]+$'.", 'type': 'invalid_request_error', 'param': 'tools[0].function.name', 'code': 'invalid_value'}}
During task with name 'agent' and id 'aedd7537-e8d5-6678-d0c5-98129586d3ac'
```

Issue:#30182
2025-03-08 20:48:25 -05:00
ccurme
cee0fecb08
docs: update package registry counts (#30181) 2025-03-08 20:37:59 -05:00
William FH
bac3a28e70
Flush (#30157) 2025-03-07 16:32:15 -08:00
ccurme
a7ab5e8372
community[patch]: ChatPerplexity: track usage metadata (#30175) 2025-03-07 23:25:05 +00:00
ccurme
1c993b921c
core[patch]: release 0.3.43 (#30173) 2025-03-07 21:56:00 +00:00
ccurme
9893e5cb80
core[patch]: catch structured_output_format (#30172)
Change to `ls_structured_output_format` was not backward-compatible with
older versions of integration packages.
2025-03-07 16:50:06 -05:00
ccurme
33a3510243
core[patch]: export ArgsSchema (#30169)
This is needed for type hints

see: https://github.com/langchain-ai/langchain/pull/30167
2025-03-07 20:43:05 +00:00
ccurme
17507c9ba6
groq[patch]: release 0.2.5 (#30168) 2025-03-07 20:25:51 +00:00
andyzhou1982
9e863c89d2
add JiebaLinkExtractor for chinese doc extracting (#30150)
Thank you for contributing to LangChain!

- [ ] **PR title**: "community: chinese doc extracting"


- [ ] **PR message**: 
- **Description:** add jieba_link_extractor.py for chinese doc
extracting
    - **Dependencies:** jieba


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
  /doc/doc/integrations/providers/jieba.md
  /doc/doc/integrations/vectorstores/jieba_link_extractor.ipynb
  /libs/packages.yml

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-03-07 20:21:46 +00:00
ccurme
74e7772a5f
groq[patch]: warn if model is not specified (#30161)
Groq is retiring `mixtral-8x7b-32768`, which is currently the default
model for ChatGroq, on March 20. Here we emit a warning if the model is
not specified explicitly.

A version 0.3.0 will be released ahead of March 20 that removes the
default altogether.
2025-03-07 15:21:13 -05:00
Ioannis Bakagiannis
3444e587ee
docs: Integration Update - ADS4GPTs (#30153)
docs: New integration for LangChain - ads4gpts-langchain

Description: Tools and Toolkit for Agentic integration natively within
LangChain with ADS4GPTs, in order to help applications monetize with
advertising.

Twitter handle: @ads4gpts

Co-authored-by: knitlydevaccount <loom+github@knitly.app>
2025-03-07 14:35:44 -05:00
ccurme
3c258194ae
tests[patch]: release 0.3.14 (#30165) 2025-03-07 18:34:05 +00:00
ccurme
34638ccfae
openai[patch]: release 0.3.8 (#30164) 2025-03-07 18:26:40 +00:00
ccurme
4e5058f29c
core[patch]: release 0.3.42 (#30163) 2025-03-07 18:14:45 +00:00
Eugene Yurtsev
894fd63a61
cli: release 0.0.36 (#30159)
Bump for 0.0.36
2025-03-07 13:05:40 -05:00
ccurme
806211475a
core[patch]: update structured output tracing (#30123)
- Trace JSON schema in `options`
- Rename to `ls_structured_output_format`
2025-03-07 13:05:25 -05:00
ccurme
230876a7c5
anthropic[patch]: add PDF input example to API reference (#30156) 2025-03-07 14:19:08 +00:00
joeconstantino
022ff9eead
Tableau docs for new datasource qa tool (#30125)
- **Description: a notebook showing langchain and langraph agents using
the new langchain_tableau tool
- **Twitter handle: @joe_constantin0

---------

Co-authored-by: Joe Constantino <joe@constantino.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-03-06 14:58:56 +00:00
ccurme
52b0570bec
core, openai, standard-tests: improve OpenAI compatibility with Anthropic content blocks (#30128)
- Support thinking blocks in core's `convert_to_openai_messages` (pass
through instead of error)
- Ignore thinking blocks in ChatOpenAI (instead of error)
- Support Anthropic-style image blocks in ChatOpenAI

---

Standard integration tests include a `supports_anthropic_inputs`
property which is currently enabled only for tests on `ChatAnthropic`.
This test enforces compatibility with message histories of the form:
```
- system message
- human message
- AI message with tool calls specified only through `tool_use` content blocks
- human message containing `tool_result` and an additional `text` block
```
It additionally checks support for Anthropic-style image inputs if
`supports_image_inputs` is enabled.

Here we change this test, such that if you enable
`supports_anthropic_inputs`:
- You support AI messages with text and `tool_use` content blocks
- You support Anthropic-style image inputs (if `supports_image_inputs`
is enabled)
- You support thinking content blocks.

That is, we add a test case for thinking content blocks, but we also
remove the requirement of handling tool results within HumanMessages
(motivated by existing agent abstractions, which should all return
ToolMessage). We move that requirement to a ChatAnthropic-specific test.
2025-03-06 09:53:14 -05:00
Pat Patterson
b3dc66f7a3
community: fix AttributeError when creating LanceDB vectorstore (#30127)
**Description:**

This PR adds a call to `guard_import()` to fix an AttributeError raised
when creating LanceDB vectorstore instance with an existing LanceDB
table.

**Issue:**

This PR fixes issue #30124.

**Dependencies:**

No additional dependencies.

**Twitter handle:**

[@metadaddy](https://x.com/metadaddy), but I spend more time at
[@metadaddy.net](https://bsky.app/profile/metadaddy.net) these days.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-03-05 23:04:38 +00:00
Hugh Gao
9b7b8e4a1a
community: make DashScope models support Partial Mode for text continuation. (#30108)
## Description
make DashScope models support Partial Mode for text continuation.

For text continuation in ChatTongYi, it supports text continuation with
a prefix by adding a "partial" argument in AIMessage. The document is
[Partial Mode
](https://help.aliyun.com/zh/model-studio/user-guide/partial-mode?spm=a2c4g.11186623.help-menu-2400256.d_1_0_0_8.211e5b77KMH5Pn&scm=20140722.H_2862210._.OR_help-T_cn~zh-V_1).
The API example is:
```py
import os
import dashscope

messages = [{
    "role": "user",
    "content": "请对“春天来了,大地”这句话进行续写,来表达春天的美好和作者的喜悦之情"
},
{
    "role": "assistant",
    "content": "春天来了,大地",
    "partial": True
}]
response = dashscope.Generation.call(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    model='qwen-plus',
    messages=messages,
    result_format='message',  
)

print(response.output.choices[0].message.content)
```

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-03-05 16:22:14 +00:00
黑牛
f0153414d5
Add request_id field to improve request tracking and debugging (for Tongyi model) (#30110)
- **Description**: Added the request_id field to the check_response
function to improve request tracking and debugging, applicable for the
Tongyi model.
- **Issue**: None
- **Dependencies**: None
- **Twitter handle**: None

- **Add tests and docs**: None

- **Lint and test**: Ran `make format`, `make lint`, and `make test` to
ensure the code meets formatting and testing requirements.
2025-03-05 11:03:47 -05:00
Manthan Surkar
1ee8aceaee
community: fix Jira API wrapper failing initialization with cloud param (#30117)
### **Description**  
Converts the boolean `jira_cloud` parameter in the Jira API Wrapper to a
string before initializing the Jira Client. Also adds tests for the
same.

### **Issue**  
[Jira API Wrapper
Bug](8abb65e138/libs/community/langchain_community/utilities/jira.py (L47))

```python
jira_cloud_str = get_from_dict_or_env(values, "jira_cloud", "JIRA_CLOUD")
jira_cloud = jira_cloud_str.lower() == "true"
```

The above code has a bug where the value of `"jira_cloud"` is a boolean.
If it is passed, calling `.lower()` on a boolean raises an error.
Additionally, `False` cannot be passed explicitly since
`get_from_dict_or_env` falls back to environment variables.

Relevant code in `langchain_core`:  

[Source](https://github.com/thesmallstar/langchain/blob/master/.venv/lib/python3.13/site-packages/langchain_core/utils/env.py#L46)

```python
if isinstance(key, str) and key in data and data[key]:  # Here, data[key] is False
```

This PR fixes both issues.

### **Twitter Handle**  
[Manthan Surkar](https://x.com/manthan_surkar)
2025-03-05 10:49:25 -05:00
Adrián Panella
c599ba47d5
core(mermaid): fix error when 3+ subgraph levels (#29970) 2025-03-04 13:27:49 -05:00
Alexander Henlein
417efa30a6
docs: add Taiga Tool integration docs (#30042)
This PR adds documentation for the langchain-taiga Tool integration,
including an example notebook at
'docs/docs/integrations/tools/taiga.ipynb' and updates to
'libs/packages.yml' to track the new package.

Issue:
N/A

Dependencies:
None

Twitter handle:
N/A

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2025-03-04 17:51:20 +00:00
Mathias Marciano
5f0102242a
Fixed an issue with the OpenAI Assistant's 'retrieval' tool and adding support for the 'attachments' parameter (#30006)
PR Title:
langchain: add attachments support in OpenAIAssistantRunnable

PR Description:
This PR fixes an issue with the "retrieval" tool (internally named
"file_search") in the OpenAI Assistant by adding support for the
"attachments" parameter in the invoke method. This change allows files
to be linked to messages when they are inserted into threads, which is
essential for utilizing OpenAI's Retrieval Augmented Generation (RAG)
feature.

Issue:
N/A

Dependencies:
None

Twitter handle:
N/A

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-03-04 17:34:11 +00:00
Philippe PRADOS
4710c1fa8c
community[minor]: Fix regular expression in visualize and outlines modules. (#30002)
Fix invalid escape characteres
2025-03-04 12:23:48 -05:00
ccurme
577c0d0715
community[patch]: release 0.3.19 (#30104) 2025-03-04 16:12:03 +00:00
ccurme
ba5ddb218f
anthropic[patch]: release 0.3.9 (#30103) 2025-03-04 10:53:55 -05:00
ccurme
9383a0536a
tests[patch]: release 0.3.13 (#30102) 2025-03-04 10:53:43 -05:00
ccurme
fb16c25920
langchain[patch]: release 0.3.20 (#30101) 2025-03-04 15:47:27 +00:00
ccurme
692a68bf1c
core[patch]: release 0.3.41 (#30100) 2025-03-04 15:08:57 +00:00
ccurme
484d945500
community[patch]: remove numpy cap for python < 3.12 (#30084) 2025-03-04 09:46:41 -05:00
ZhangShenao
8575d7491f
[Doc] Improve api doc (#30073)
- Update api_doc for `BaseMessage`
- add static method decorator for `retry_runnable`
2025-03-04 09:39:07 -05:00
Samuel Dion-Girardeau
ccb64e9f4f
docs: Fix typo in code samples for max_tokens_for_prompt (#30088)
- **Description:** Fix typo in code samples for max_tokens_for_prompt.
Code blocks had singular "token" but the method has plural "tokens".
- **Issue:** N/A
- **Dependencies:** N/A
- **Twitter handle:** N/A
2025-03-04 09:11:21 -05:00
ArrayPD
c671d54c6f
core: make with_alisteners() example workable. (#30059)
**Description:**

5 fix of example from function with_alisteners() in
libs/core/langchain_core/runnables/base.py
Replace incoherent example output with workable example's output.

1. SyntaxError: unterminated string literal
    print(f"on start callback starts at {format_t(time.time())}
    correct as
    print(f"on start callback starts at {format_t(time.time())}")

2. SyntaxError: unterminated string literal
    print(f"on end callback starts at {format_t(time.time())}
    correct as
    print(f"on end callback starts at {format_t(time.time())}")

3. NameError: name 'Runnable' is not defined
    Fix as
    from langchain_core.runnables import Runnable

4. NameError: name 'asyncio' is not defined
    Fix as
    import asyncio

5. NameError: name 'format_t' is not defined.
    Implement format_t() as
    from datetime import datetime, timezone

    def format_t(timestamp: float) -> str:
return datetime.fromtimestamp(timestamp, tz=timezone.utc).isoformat()
2025-03-01 15:39:02 -05:00
cold-eye
7c175e3fda
Update ascend.py (#30060)
add batch_size to fix oom when embed large amount texts

Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2025-03-01 14:10:41 -05:00
ccurme
3b066dc005
anthropic[patch]: allow structured output when thinking is enabled (#30047)
Structured output will currently always raise a BadRequestError when
Claude 3.7 Sonnet's `thinking` is enabled, because we rely on forced
tool use for structured output and this feature is not supported when
`thinking` is enabled.

Here we:
- Emit a warning if `with_structured_output` is called when `thinking`
is enabled.
- Raise `OutputParserException` if no tool calls are generated.

This is arguably preferable to raising an error in all cases.

```python
from langchain_anthropic import ChatAnthropic
from pydantic import BaseModel


class Person(BaseModel):
    name: str
    age: int


llm = ChatAnthropic(
    model="claude-3-7-sonnet-latest",
    max_tokens=5_000,
    thinking={"type": "enabled", "budget_tokens": 2_000},
)
structured_llm = llm.with_structured_output(Person)  # <-- this generates a warning
```

```python
structured_llm.invoke("Alice is 30.")  # <-- works
```

```python
structured_llm.invoke("Hello!")  # <-- raises OutputParserException
```
2025-02-28 14:44:11 -05:00
ccurme
f8ed5007ea
anthropic, mistral: return model_name in response metadata (#30048)
Took a "census" of models supported by init_chat_model-- of those that
return model names in response metadata, these were the only two that
had it keyed under `"model"` instead of `"model_name"`.
2025-02-28 18:56:05 +00:00
Christophe Bornet
9e6ffd1264
core: Add ruff rules PTH (pathlib) (#29338)
See https://docs.astral.sh/ruff/rules/#flake8-use-pathlib-pth

Co-authored-by: ccurme <chester.curme@gmail.com>
2025-02-28 13:22:20 -05:00
TheSongg
86b364de3b
Add asynchronous generate interface (#30001)
- [ ] **PR title**: [langchain_community.llms.xinference]: Add
asynchronous generate interface

- [ ] **PR message**: The asynchronous generate interface support stream
data and non-stream data.
          
        chain = prompt | llm
        async for chunk in chain.astream(input=user_input):
            yield chunk


- [ ] **Add tests and docs**:

       from langchain_community.llms import Xinference
       from langchain.prompts import PromptTemplate

       llm = Xinference(
server_url="http://0.0.0.0:9997", # replace your xinference server url
model_uid={model_uid} # replace model_uid with the model UID return from
launching the model
           stream = True
            )
prompt = PromptTemplate(input=['country'], template="Q: where can we
visit in the capital of {country}? A:")
       chain = prompt | llm
       async for chunk in chain.astream(input=user_input):
           yield chunk
2025-02-28 12:32:44 -05:00
Fakai Zhao
f07338d2bf
Implementing the MMR algorithm for OLAP vector storage (#30033)
Thank you for contributing to LangChain!

-  **Implementing the MMR algorithm for OLAP vector storage**: 
  - Support Apache Doris and StarRocks OLAP database.
- Example: "vectorstore.as_retriever(search_type="mmr",
search_kwargs={"k": 10})"


- **Implementing the MMR algorithm for OLAP vector storage**: 
    - **Apache Doris
    - **StarRocks
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- **Add tests and docs**: 
- Example: "vectorstore.as_retriever(search_type="mmr",
search_kwargs={"k": 10})"


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: fakzhao <fakzhao@cisco.com>
2025-02-28 08:50:22 -05:00
Daniel Rauber
186cd7f1a1
community: PlaywrightURLLoader should wait for page load event before attempting to extract data (#30043)
## Description

The PlaywrightURLLoader should wait for a page to be loaded before
attempting to extract data.
2025-02-28 08:45:51 -05:00
ccurme
0dbcc1d099
docs: document anthropic features (#30030)
Update integrations page with extended thinking feature.

Update API reference with extended thinking and citations.
2025-02-27 19:37:04 -05:00
ccurme
6c7c8a164f
openai[patch]: add unit test (#30022)
Test `max_completion_tokens` is propagated to payload for
AzureChatOpenAI.
2025-02-27 11:09:17 -05:00
DamonXue
156a60013a
docs: fix tavily_search code-block format. (#30012)
This pull request includes a change to the `TavilySearchResults` class
in the `tool.py` file, which updates the code block format in the
documentation.

Documentation update:

*
[`libs/community/langchain_community/tools/tavily_search/tool.py`](diffhunk://#diff-e3b6a980979268b639c6a86e9b182756b0f7c7e9e5605e613bc0a72ea6aa5301L54-R59):
Changed the code block format from Python to JSON in the example
provided in the docstring.Thank you for contributing to LangChain!
2025-02-27 10:55:15 -05:00
kawamou
8977ac5ab0
community[fix]: Handle None value in raw_content from Tavily API response (#30021)
## **Description:**

When using the Tavily retriever with include_raw_content=True, the
retriever occasionally fails with a Pydantic ValidationError because
raw_content can be None.

The Document model in langchain_core/documents/base.py requires
page_content to be a non-None value, but the Tavily API sometimes
returns None for raw_content.

This PR fixes the issue by ensuring that even when raw_content is None,
an empty string is used instead:

```python
page_content=result.get("content", "")
            if not self.include_raw_content
            else (result.get("raw_content") or ""),
2025-02-27 10:53:53 -05:00
Lakindu Boteju
f69deee1bd
community: Add cost data for aws bedrock anthropic.claude-3-7 model (#30016)
This pull request includes updates to the
`libs/community/langchain_community/callbacks/bedrock_anthropic_callback.py`
file to add a new model version to the list of supported models.

Updates to supported models:

* Added support for the `anthropic.claude-3-7-sonnet-20250219-v1:0`
model with a rate of `0.003` for 1000 input tokens.
* Added support for the `anthropic.claude-3-7-sonnet-20250219-v1:0`
model with a rate of `0.015` for 1000 output tokens.

AWS Bedrock pricing reference : https://aws.amazon.com/bedrock/pricing
2025-02-27 09:51:52 -05:00
Lakindu Boteju
e0e9e560b3
PyMuPDF4LLM integration to LangChain (#29953)
## PyMuPDF4LLM integration to LangChain for PDF content extraction in
Markdown format

### Description

[PyMuPDF4LLM](https://github.com/pymupdf/RAG) makes it easier to extract
PDF content in Markdown format, needed for LLM & RAG applications.
(License: GNU Affero General Public License v3.0)


[langchain-pymupdf4llm](https://github.com/lakinduboteju/langchain-pymupdf4llm)
integrates PyMuPDF4LLM to LangChain as a Document Loader.
(License: MIT License)

This pull request introduces the integration of
[PyMuPDF4LLM](https://pymupdf.readthedocs.io/en/latest/pymupdf4llm) into
the LangChain project as an integration package:
[`langchain-pymupdf4llm`](https://github.com/lakinduboteju/langchain-pymupdf4llm).
The most important changes include adding new Jupyter notebooks to
document the integration and updating the package configuration file to
include the new package.

### Documentation:

* `docs/docs/integrations/providers/pymupdf4llm.ipynb`: Added a new
Jupyter notebook to document the integration of `PyMuPDF4LLM` with
LangChain, including installation instructions and class imports.
* `docs/docs/integrations/document_loaders/pymupdf4llm.ipynb`: Added a
new Jupyter notebook to document the usage of `langchain-pymupdf4llm` as
a LangChain integration package in detail.

### Package registration:

* `libs/packages.yml`: Updated the package configuration file to include
the `langchain-pymupdf4llm` package.

### Additional information

* Related to: https://github.com/langchain-ai/langchain/pull/29848

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-02-26 15:59:12 -05:00
Dan Mirsky
d98c3f76c2
core[patch]: Fix FileCallbackHandler name resolution, Fixes #29941 (#29942)
- **Description:** Same changes as #26593 but for FileCallbackHandler
- **Issue:**  Fixes #29941
- **Dependencies:** None
- **Twitter handle:** None

- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
2025-02-26 14:54:24 -05:00
Christophe Bornet
b3885c124f
core: Add ruff rules TC (#29268)
See https://docs.astral.sh/ruff/rules/#flake8-type-checking-tc
Some fixes done for TC001,TC002 and TC003 but these rules are excluded
since they don't play well with Pydantic.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-02-26 19:39:05 +00:00
talos
9cd20080fc
community: Update SQLiteVec table trigger (#29914)
**Issue**: This trigger can only be used by the first table created.
Cannot create additional triggers for other tables.

**fixed**: Update the trigger name so that it can be used for new
tables.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-02-26 15:10:13 +00:00
ccurme
7562677f3f
langchain[patch]: delete erroneous lock file (#30007)
Picked up during merge.
2025-02-26 15:01:05 +00:00
Erick Friis
3c96012f5e
langchain: make numpy optional (#29182)
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-02-26 14:35:24 +00:00
Artem Yankov
6177b9f9ab
community: add title, score and raw_content to tavily search results (#29995)
**Description:**

Tavily search results returned from API include useful information like
title, score and (optionally) raw_content that is missed in wrapper
although it's documented there properly. Add this data to the result
structure.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-02-25 23:27:21 +00:00
Eugene Yurtsev
b525226531
core[patch]: version 0.3.40 (#29997)
Version 0.3.40 release
2025-02-25 23:09:40 +00:00
Vadym Barda
0fc50b82a0
core[patch]: allow passing description to @tool decorator (#29976) 2025-02-25 17:45:36 -05:00
Naveen SK
21bfc95e14
docs: Correct grammatical typos in various documentation files (#29983)
**Description:**
Fixed grammatical typos in various documentation files

**Issue:**
N/A

**Dependencies:**
N/A

**Twitter handle:**
@MrNaveenSK

Co-authored-by: ccurme <chester.curme@gmail.com>
2025-02-25 19:13:31 +00:00
ccurme
1158d3134d
langchain[patch]: remove aiohttp (#29991)
My guess is this was left over from when `community` was in langchain.
2025-02-25 11:43:00 -05:00
ccurme
afd7888392
langchain[patch]: remove explicit dependency on tenacity (#29990)
Not used anywhere in `langchain`, already a dependency of
langchain-core.
2025-02-25 11:31:55 -05:00
ccurme
32704f0ad8
langchain: update extended test (#29988) 2025-02-25 14:58:20 +00:00
Yan
47e1a384f7
Writer partners integration docs (#29961)
**Documentation of Writer provider and additional features**
* [PyPi langchain-writer
web-page](https://pypi.org/project/langchain-writer/)
* [GitHub langchain-writer
repo](https://github.com/writer/langchain-writer)

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-02-24 19:30:09 -05:00
ccurme
79f5bbfb26
anthropic[patch]: release 0.3.8 (#29973) 2025-02-24 15:24:35 -05:00
ccurme
ded886f622
anthropic[patch]: support claude 3.7 sonnet (#29971) 2025-02-24 15:17:47 -05:00
Bagatur
d00d645829
docs[patch]: update disable_streaming docstring (#29968) 2025-02-24 18:40:31 +00:00
ccurme
b7a1705052
openai[patch]: release 0.3.7 (#29967) 2025-02-24 11:59:28 -05:00
ccurme
5437ee385b
core[patch]: release 0.3.39 (#29966) 2025-02-24 11:47:01 -05:00
ccurme
291a232fb8
openai[patch]: set global ssl context (#29932)
We set 
```python
global_ssl_context = ssl.create_default_context(cafile=certifi.where())
```
at the module-level and share it among httpx clients.
2025-02-24 11:25:16 -05:00
ccurme
9ce07980b7
core[patch]: pydantic 2.11 compat (#29963)
Resolves https://github.com/langchain-ai/langchain/issues/29951

Was able to reproduce the issue with Anthropic installing from pydantic
`main` and correct it with the fix recommended in the issue.

Thanks very much @Viicos for finding the bug and the detailed writeup!
2025-02-24 11:11:25 -05:00
ccurme
0d3a3b99fc
core[patch]: release 0.3.38 (#29962) 2025-02-24 15:04:53 +00:00
ccurme
b1a7f4e106
core, openai[patch]: support serialization of pydantic models in messages (#29940)
Resolves https://github.com/langchain-ai/langchain/issues/29003,
https://github.com/langchain-ai/langchain/issues/27264
Related: https://github.com/langchain-ai/langchain-redis/issues/52

```python
from langchain.chat_models import init_chat_model
from langchain.globals import set_llm_cache
from langchain_community.cache import SQLiteCache
from pydantic import BaseModel

cache = SQLiteCache()

set_llm_cache(cache)

class Temperature(BaseModel):
    value: int
    city: str

llm = init_chat_model("openai:gpt-4o-mini")
structured_llm = llm.with_structured_output(Temperature)
```
```python
# 681 ms
response = structured_llm.invoke("What is the average temperature of Rome in May?")
```
```python
# 6.98 ms
response = structured_llm.invoke("What is the average temperature of Rome in May?")
```
2025-02-24 09:34:27 -05:00
ccurme
927ec20b69
openai[patch]: update system role to developer for o-series models (#29785)
Some o-series models will raise a 400 error for `"role": "system"`
(`o1-mini` and `o1-preview` will raise, `o1` and `o3-mini` will not).

Here we update `ChatOpenAI` to update the role to `"developer"` for all
model names matching `^o\d`.

We only make this change on the ChatOpenAI class (not BaseChatOpenAI).
2025-02-24 08:59:46 -05:00
Ahmed Tammaa
8b511a3a78
[Exception Handling] DeepSeek JSONDecodeError (#29758)
For Context please check #29626 

The Deepseek is using langchain_openai. The error happens that it show
`json decode error`.

I added a handler for this to give a more sensible error message which
is DeepSeek API returned empty/invalid json.

Reproducing the issue is a bit challenging as it is inconsistent,
sometimes DeepSeek returns valid data and in other times it returns
invalid data which triggers the JSON Decode Error.

This PR is an exception handling, but not an ultimate fix for the issue.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-02-23 15:00:32 -05:00
Julien Elkaim
e586bffe51
community: Repair embeddings/llamacpp's embed_query method (#29935)
**Description:** As commented on the commit
[41b6a86](41b6a86bbe)
it introduced a bug for when we do an embedding request and the model
returns a non-nested list. Typically it's the case for model
**_nomic-embed-text_**.

- I added the unit test, and ran `make format`, `make lint` and `make
test` from the `community` package.
- No new dependency.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-02-23 19:32:17 +00:00
Saraswathy Kalaiselvan
5ca4933b9d
docs: updated ChatLiteLLM model_kwargs description (#29937)
- [x] **PR title**: docs: (community) update ChatLiteLLM

- [x] **PR message**:
- **Description:** updated description of model_kwargs parameter which
was wrongly describing for temperature.
    - **Issue:** #29862 
    - **Dependencies:** N/A
    
- [x] **Add tests and docs**: N/A

- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-02-23 19:27:13 +00:00
ccurme
512eb1b764
anthropic[patch]: update models for integration tests (#29938) 2025-02-23 14:23:48 -05:00
Christophe Bornet
f6d4fec4d5
core: Add ruff rules ANN (type annotations) (#29271)
See https://docs.astral.sh/ruff/rules/#flake8-annotations-ann
The interest compared to only mypy is that ruff is very fast at
detecting missing annotations.

ANN101 and ANN102 are deprecated so we ignore them 
ANN401 (no Any type) ignored to be in sync with mypy config

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2025-02-22 17:46:28 -05:00
Bagatur
979a991dc2
core[patch]: dont deep copy merge_message_runs (#28454)
afaict no need to deep copy here, if we merge messages then we convert
them to chunks first anyways
2025-02-22 21:56:45 +00:00
Mohammad Mohtashim
afa94e5bf7
_wait_for_run calling fix for OpenAIAssistantRunnable (#29927)
- **Description:** Fixed the `OpenAIAssistantRunnable` call of
`_wait_for_run`
- **Issue:**  #29923
2025-02-22 00:27:24 +00:00
Vadym Barda
437fe6d216
core[patch]: return ToolMessage from tools when tool call ID is empty string (#29921) 2025-02-21 11:53:15 -05:00
Taofiq Aiyelabegan
5ee8a8f063
[Integration]: Langchain-Permit (#29867)
## Which area of LangChain is being modified?
- This PR adds a new "Permit" integration to the `docs/integrations/`
folder.
- Introduces two new Tools (`LangchainJWTValidationTool` and
`LangchainPermissionsCheckTool`)
- Introduces two new Retrievers (`PermitSelfQueryRetriever` and
`PermitEnsembleRetriever`)
- Adds demo scripts in `examples/` showcasing usage.

## Description of Changes
- Created `langchain_permit/tools.py` for JWT validation and permission
checks with Permit.
- Created `langchain_permit/retrievers.py` for custom Permit-based
retrievers.
- Added documentation in `docs/integrations/providers/permit.ipynb` (or
`.mdx`) to explain setup, usage, and examples.
- Provided sample scripts in `examples/demo_scripts/` to illustrate
usage of these tools and retrievers.
- Ensured all code is linted and tested locally.

Thank you again for reviewing!

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-02-21 10:59:00 -05:00
Jean-Philippe Dournel
ebe38baaf9
community/mlx_pipeline: fix crash at mlx call (#29915)
- **Description:** 
Since mlx_lm 0.20, all calls to mlx crash due to deprecation of the way
parameters are passed to methods generate and generate_step.
Parameters top_p, temp, repetition_penalty and repetition_context_size
are not passed directly to those method anymore but wrapped into
"sampler" and "logit_processor".


- **Dependencies:** mlx_lm (optional)

-  **Tests:** 
I've had a new test to existing test file:
tests/integration_tests/llms/test_mlx_pipeline.py

---------

Co-authored-by: Jean-Philippe Dournel <jp@insightkeeper.io>
2025-02-21 09:14:53 -05:00
ccurme
1fa9f6bc20
docs: build mongo in api ref (#29908) 2025-02-20 19:58:35 -05:00
Chaunte W. Lacewell
d972c6d6ea
partners: add langchain-vdms (#29857)
**Description:** Deprecate vdms in community, add integration
langchain-vdms, and update any related files
**Issue:** n/a
**Dependencies:** langchain-vdms
**Twitter handle:** n/a

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-02-20 19:48:46 -05:00
Mohammad Mohtashim
8293142fa0
mistral[patch]: support model_kwargs (#29838)
- **Description:** Frequency_penalty added as a client parameter
- **Issue:** #29803

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-02-20 18:47:39 -05:00
ccurme
924d9b1b33
cli[patch]: fix retriever template (#29907)
Chat model tabs don't render correctly in .ipynb template.
2025-02-20 17:51:19 +00:00
Brayden Zhong
a70f31de5f
Community: RankLLMRerank AttributeError (Handle list-based rerank results) (#29840)
# community: Fix AttributeError in RankLLMRerank (`list` object has no
attribute `candidates`)

## **Description**
This PR fixes an issue in `RankLLMRerank` where reranking fails with the
following error:

```
AttributeError: 'list' object has no attribute 'candidates'
```

The issue arises because `rerank_batch()` returns a `List[Result]`
instead of an object containing `.candidates`.

### **Changes Introduced**
- Adjusted `compress_documents()` to support both:
  - Old API format: `rerank_results.candidates`
  - New API format: `rerank_results` as a list
  - Also fix wrong .txt location parsing while I was at it.

---

## **Issue**
Fixes **AttributeError** in `RankLLMRerank` when using
`compression_retriever.invoke()`. The issue is observed when
`rerank_batch()` returns a list instead of an object with `.candidates`.

**Relevant log:**
```
AttributeError: 'list' object has no attribute 'candidates'
```

## **Dependencies**
- No additional dependencies introduced.

---

## **Checklist**
- [x] **Backward compatible** with previous API versions
- [x] **Tested** locally with different RankLLM models
- [x] **No new dependencies introduced**
- [x] **Linted** with `make format && make lint`
- [x] **Ready for review**

---

## **Testing**
- Ran `compression_retriever.invoke(query)`

## **Reviewers**
If no review within a few days, please **@mention** one of:
- @baskaryan
- @efriis
- @eyurtsev
- @ccurme
- @vbarda
- @hwchase17
2025-02-20 12:38:31 -05:00
Levon Ghukasyan
ec403c442a
Separate deepale vector store (#29902)
Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"

- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-02-20 17:37:19 +00:00
Jorge Piedrahita Ortiz
3acf842e35
core: add sambanova chat models to load module mapping (#29855)
- **Description:** add sambanova integration package chat models to load
module mapping, to allow serialization and deserialization
2025-02-20 12:30:50 -05:00
ccurme
d227e4a08e
mistralai[patch]: release 0.2.7 (#29906) 2025-02-20 17:27:12 +00:00
Hande
d8bab89e6e
community: add cognee retriever (#29878)
This PR adds a new cognee integration, knowledge graph based retrieval
enabling developers to ingest documents into cognee’s knowledge graph,
process them, and then retrieve context via CogneeRetriever.
It includes:
- langchain_cognee package with a CogneeRetriever class
- a test for the integration, demonstrating how to create, process, and
retrieve with cognee
- an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


Followed additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

Thank you for the review!

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-02-20 17:15:23 +00:00
dokato
92b415a9f6
community: Made some Jira fields optional for agent to work correctly (#29876)
**Description:** Two small changes have been proposed here:
(1)
Previous code assumes that every issue has a priority field. If an issue
lacks this field, the code will raise a KeyError.
Now, the code checks if priority exists before accessing it. If priority
is missing, it assigns None instead of crashing. This prevents runtime
errors when processing issues without a priority.

(2)

Also If the "style" field is missing, the code throws a KeyError.
`.get("style", None)` safely retrieves the value if present.

**Issue:** #29875 

**Dependencies:** N/A
2025-02-20 12:10:11 -05:00
am-kinetica
ca7eccba1f
Handled a bug around empty query results differently (#29877)
Thank you for contributing to LangChain!

- [ ] **Handled query records properly**: "community:
vectorstores/kinetica"

- [ ] **Bugfix for empty query results handling**: 
- **Description:** checked for the number of records returned by a query
before processing further
- **Issue:** resulted in an `AttributeError` earlier which has now been
fixed

@efriis
2025-02-20 12:07:49 -05:00
Antonio Pisani
2c403a3ea9
docs: Add langchain-prolog documentation (#29788)
I want to add documentation for a new integration with SWI-Prolog.

@hwchase17 check this out:

https://github.com/apisani1/langchain-prolog/tree/main/examples/travel_agent

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-02-20 11:50:28 -05:00
Marlene
be7fa920fa
Partner: Azure AI Langchain Docs and Package Registry (#29879)
This PR adds documentation for the Azure AI package in Langchain to the
main mono-repo

No issue connected or updated dependencies.

Utilises existing tests and makes updates to the docs

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-02-20 14:35:26 +00:00
Hankyeol Kyung
2dd0ce3077
openai: Update reasoning_effort arg documentation (#29897)
**Description:** Update docstring for `reasoning_effort` argument to
specify that it applies to reasoning models only (e.g., OpenAI o1 and
o3-mini), clarifying its supported models.
**Issue:** None
**Dependencies:** None
2025-02-20 09:03:42 -05:00
ccurme
ed3c2bd557
core[patch]: set version="v2" as default in astream_events (#29894) 2025-02-19 23:21:37 +00:00
Fabian Blatz
a2d05a376c
community: ConfluenceLoader: add a filter method for attachments (#29882)
Adds a `attachment_filter_func` parameter to the ConfluenceLoader class
which can be used to determine which files are indexed. This is useful
if you are interested in excluding files based on their media type or
other metadata.
2025-02-19 18:20:45 -05:00
ccurme
9ed47a4d63
community[patch]: release 0.3.18 (#29896) 2025-02-19 20:13:00 +00:00
ccurme
92889edafd
core[patch]: release 0.3.37 (#29895) 2025-02-19 20:04:35 +00:00
ccurme
ffd6194060
core[patch]: de-beta rate limiters (#29891) 2025-02-19 19:19:59 +00:00
ccurme
fb4c8423f0
docs: fix builds (#29890)
Missed in https://github.com/langchain-ai/langchain/pull/29889
2025-02-19 13:35:59 -05:00
ccurme
68b13e5172
pinecone: delete from monorepo (#29889)
This now lives in https://github.com/langchain-ai/langchain-pinecone
2025-02-19 12:55:15 -05:00
Erick Friis
6c1e21d128
core: basemessage.text() (#29078) 2025-02-18 17:45:44 -08:00
Eugene Yurtsev
8e5074d82d
core: release 0.3.36 (#29869)
Release 0.3.36
2025-02-18 19:51:43 +00:00
Vadym Barda
d04fa1ae50
core[patch]: allow passing JSON schema as args_schema to tools (#29812) 2025-02-18 14:44:31 -05:00
ccurme
5034a8dc5c
xai[patch]: release 0.2.1 (#29854) 2025-02-17 14:30:41 -05:00
ccurme
83dcef234d
xai[patch]: support dedicated structured output feature (#29853)
https://docs.x.ai/docs/guides/structured-outputs

Interface appears identical to OpenAI's.
```python
from langchain.chat_models import init_chat_model
from pydantic import BaseModel

class Joke(BaseModel):
    setup: str
    punchline: str

llm = init_chat_model("xai:grok-2").with_structured_output(
    Joke, method="json_schema"
)
llm.invoke("Tell me a joke about cats.")
```
2025-02-17 14:19:51 -05:00
ccurme
9d6fcd0bfb
infra: add xai to scheduled testing (#29852) 2025-02-17 18:59:45 +00:00
ccurme
8a3b05ae69
langchain[patch]: release 0.3.19 (#29851) 2025-02-17 13:36:23 -05:00
ccurme
c9061162a1
langchain[patch]: add xai to extras (#29850) 2025-02-17 17:49:34 +00:00
Bagatur
1acf57e9bd
langchain[patch]: init_chat_model xai support (#29849) 2025-02-17 09:45:39 -08:00
hsm207
037b129b86
weaviate: Add-deprecation-warning (#29757)
- **Description:** add deprecation warning when using weaviate from
langchain_community
  - **Issue:** NA
  - **Dependencies:** NA
  - **Twitter handle:** NA

---------

Signed-off-by: hsm207 <hsm207@users.noreply.github.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-02-16 21:42:18 -05:00
Đỗ Quang Minh
cd198ac9ed
community: add custom model for OpenAIWhisperParser (#29831)
Add `model` properties for OpenAIWhisperParser. Defaulted to `whisper-1`
(previous value).
Please help me update the docs and other related components of this
repo.
2025-02-16 21:26:07 -05:00
Cole McIntosh
6874c9c1d0
docs: add notebook for langchain-salesforce package (#29800)
**Description:**  
This PR adds a Jupyter notebook that explains the features,
installation, and usage of the
[`langchain-salesforce`](https://github.com/colesmcintosh/langchain-salesforce)
package. The notebook includes:
- Setup instructions for configuring Salesforce credentials  
- Example code demonstrating common operations such as querying,
describing objects, creating, updating, and deleting records

**Issue:**  
N/A

**Dependencies:**  
No new dependencies are required.

**Tests and Docs:**  
- Added an example notebook demonstrating the usage of the
`langchain-salesforce` package, located in `docs/docs/integrations`.

**Lint and Test:**  
- Ran `make format`, `make lint`, and `make test` successfully.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-02-16 08:34:57 -05:00
Jan Heimes
60f58df5b3
community: add top_k as param to Needle Retriever (#29821)
Thank you for contributing to LangChain!

- [X] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"


- [x] **PR message**: 
This PR adds top_k as a param to the Needle Retriever. By default we use
top 10.



- [X] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [X] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2025-02-16 08:30:52 -05:00
Jesus Fernandez Bes
1dfac909d8
community: Adding IN Operator to AzureCosmosDBNoSQLVectorStore (#29805)
- ** Description**: I have added a new operator in the operator map with
key `$in` and value `IN`, so that you can define filters using lists as
values. This was already contemplated but as IN operator was not in the
map they cannot be used.
- **Issue**: Fixes #29804.
- **Dependencies**: No extra.
2025-02-15 21:44:54 -05:00
Wahed Hemati
8901b113c3
docs: add Discord integration docs (#29822)
This PR adds documentation for the `langchain-discord-shikenso`
integration, including an example notebook at
`docs/docs/integrations/tools/discord.ipynb` and updates to
`libs/packages.yml` to track the new package.

  **Issue:**  
  N/A

  **Dependencies:**  
  None

  **Twitter handle:**  
  N/A

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-02-15 21:43:45 -05:00
Krishna Kulkarni
a98c5f1c4b
langchain_community: add image support to DuckDuckGoSearchAPIWrapper (#29816)
- [ ] **PR title**: langchain_community: add image support to
DuckDuckGoSearchAPIWrapper

- **Description:** This PR enhances the DuckDuckGoSearchAPIWrapper
within the langchain_community package by introducing support for image
searches. The enhancement includes:
  - Adding a new method _ddgs_images to handle image search queries.
- Updating the run and results methods to process and return image
search results appropriately.
- Modifying the source parameter to accept "images" as a valid option,
alongside "text" and "news".
- **Dependencies:** No additional dependencies are required for this
change.
2025-02-15 21:32:14 -05:00
Iris Liu
0d9f0b4215
docs: updates Chroma integration API ref docs (#29826)
- Description: updates Chroma integration API ref docs
- Issue: #29817
- Dependencies: N/A
- Twitter handle: @irieliu

Co-authored-by: “Iris <“liuirisny@gmail.com”>
2025-02-15 21:05:21 -05:00
ccurme
3fe7c07394
openai[patch]: release 0.3.6 (#29824) 2025-02-15 13:53:35 -05:00
ccurme
65a6dce428
openai[patch]: enable streaming for o1 (#29823)
Verified streaming works for the `o1-2024-12-17` snapshot as well.
2025-02-15 12:42:05 -05:00
Christophe Bornet
3dffee3d0b
all: Bump blockbuster version to 1.5.18 (#29806)
Has fixes for running on Windows and non-CPython runtimes.
2025-02-14 07:55:38 -08:00
ccurme
d9a069c414
tests[patch]: release 0.3.12 (#29797) 2025-02-13 23:57:44 +00:00
ccurme
e4f106ea62
groq[patch]: remove xfails (#29794)
These appear to pass.
2025-02-13 15:49:50 -08:00
Erick Friis
f34e62ef42
packages: add langchain-xai (#29795)
wasn't registered per the contribution guide:
https://python.langchain.com/docs/contributing/how_to/integrations/
2025-02-13 15:36:41 -08:00
ccurme
49cc6106f7
tests[patch]: fix query for test_tool_calling_with_no_arguments (#29793) 2025-02-13 23:15:52 +00:00
Erick Friis
1a225fad03
multiple: fix uv path deps (#29790)
file:// format wasn't working with updates - it doesn't install as an
editable dep

move to tool.uv.sources with path= instead
2025-02-13 21:32:34 +00:00
Erick Friis
ff13384eb6
packages: update counts, add command (#29789) 2025-02-13 20:45:25 +00:00
HackHuang
76d32754ff
core : update the class docs of InMemoryVectorStore in in_memory.py (#29781)
- **Description:** Add the new introduction about checking `store` in
in_memory.py, It’s necessary and useful for beginners.
```python
Check Documents:
    .. code-block:: python
    
        for doc in vector_store.store.values():
            print(doc)
```

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-02-13 16:41:47 +00:00
Mohammad Mohtashim
96ad09fa2d
(Community): Added API Key for Jina Search API Wrapper (#29622)
- **Description:** Simple change for adding the API Key for Jina Search
API Wrapper
- **Issue:** #29596
2025-02-12 20:12:07 -08:00
ccurme
f1c66a3040
docs: minor fix to provider table (#29771)
Langfair renders as LangfAIr
2025-02-13 04:06:58 +00:00
Jakub Kopecký
c8cb7c25bf
docs: update apify integration (#29553)
**Description:** Fixed and updated Apify integration documentation to
use the new [langchain-apify](https://github.com/apify/langchain-apify)
package.
**Twitter handle:** @apify
2025-02-12 20:02:55 -08:00
ccurme
16fb1f5371
chroma[patch]: release 0.2.2 (#29769)
Resolves https://github.com/langchain-ai/langchain/issues/29765
2025-02-13 02:39:16 +00:00
Mohammad Mohtashim
2310847c0f
(Chroma): Small Fix in add_texts when checking for embeddings (#29766)
- **Description:** Small fix in `add_texts` to make embedding
nullability is checked properly.
- **Issue:** #29765

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-02-13 02:26:13 +00:00
Eric Pinzur
716fd89d8e
docs: contributed Graph RAG Retriever integration (#29744)
**Description:** 

This adds the `Graph RAG` Retriever integration documentation, per
https://python.langchain.com/docs/contributing/how_to/integrations/.

* The integration exists in this public repository:
https://github.com/datastax/graph-rag
* We've implemented the standard langchain tests for retrievers:
https://github.com/datastax/graph-rag/blob/main/packages/langchain-graph-retriever/tests/test_langchain.py
* Our integration is published to PyPi:
https://pypi.org/project/langchain-graph-retriever/
2025-02-12 18:25:48 -08:00
Sunish Sheth
f42dafa809
Deprecating sql_database access for creating UC functions for agent tools (#29745)
Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2025-02-13 02:24:44 +00:00
Thor 雷神 Schaeff
a0970d8d7e
[WIP] chore: update ElevenLabs tool. (#29722)
Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-02-13 01:54:34 +00:00
Chaymae El Aattabi
4b08a7e8e8
Fix #29759: Use local chunk_size_ for looping in embed_documents (#29761)
This fix ensures that the chunk size is correctly determined when
processing text embeddings. Previously, the code did not properly handle
cases where chunk_size was None, potentially leading to incorrect
chunking behavior.

Now, chunk_size_ is explicitly set to either the provided chunk_size or
the default self.chunk_size, ensuring consistent chunking. This update
improves reliability when processing large text inputs in batches and
prevents unintended behavior when chunk_size is not specified.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-02-13 01:28:26 +00:00
Sunish Sheth
043d78d85d
Deprecate langhchain community ucfunctiontoolkit in favor for databricks_langchain (#29746)
Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2025-02-12 15:50:35 -08:00
Hugues Chocart
e4eec9e9aa
community: add langchain-abso documentation (#29739)
Add the documentation for the community package `langchain-abso`. It
provides a new Chat Model class, that uses https://abso.ai

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2025-02-12 19:57:33 +00:00
ccurme
e61f463745
core[patch]: release 0.3.35 (#29764) 2025-02-12 18:13:10 +00:00
Nuno Campos
fe59f2cc88
core: Fix output of convert_messages when called with BaseMessage.model_dump() (#29763)
- additional_kwargs was being nested twice
- example, response_metadata was placed inside additional_kwargs
2025-02-12 10:05:33 -08:00
Jacob Lee
f4e3e86fbb
feat(langchain): Infer o3 modelstrings passed to init_chat_model as OpenAI (#29743) 2025-02-11 16:51:41 -08:00
Mohammad Mohtashim
9f3bcee30a
(Community): Adding Structured Support for ChatPerplexity (#29361)
- **Description:** Adding Structured Support for ChatPerplexity
- **Issue:** #29357
- This is implemented as per the Perplexity official docs:
https://docs.perplexity.ai/guides/structured-outputs

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2025-02-11 15:51:18 -08:00
Jawahar S
994c5465e0
feat: add support for IBM WatsonX AI chat models (#29688)
**Description:** Updated init_chat_model to support Granite models
deployed on IBM WatsonX
**Dependencies:**
[langchain-ibm](https://github.com/langchain-ai/langchain-ibm)

Tagging @baskaryan @efriis for review when you get a chance.
2025-02-11 15:34:29 -08:00
Shailendra Mishra
c7d74eb7a3
Oraclevs integration (#29723)
Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"
  community: langchain_community/vectorstore/oraclevs.py


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** Refactored code to allow a connection or a connection
pool.
- **Issue:** Normally an idel connection is terminated by the server
side listener at timeout. A user thus has to re-instantiate the vector
store. The timeout in case of connection is not configurable. The
solution is to use a connection pool where a user can specify a user
defined timeout and the connections are managed by the pool.
    - **Dependencies:** None
    - **Twitter handle:** 


- [ ] **Add tests and docs**: This is not a new integration. A user can
pass either a connection or a connection pool. The determination of what
is passed is made at run time. Everything should work as before.

- [ ] **Lint and test**:  Already done.

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2025-02-11 14:56:55 -08:00
ccurme
42ebf6ae0c
deepseek[patch]: release 0.1.2 (#29742) 2025-02-11 11:53:43 -08:00
ccurme
ec55553807
pinecone[patch]: release 0.2.3 (#29741) 2025-02-11 19:27:39 +00:00
ccurme
001cf99253
pinecone[patch]: add support for python 3.13 (#29737) 2025-02-11 11:20:21 -08:00
ccurme
ba8f752bf5
openai[patch]: release 0.3.5 (#29740) 2025-02-11 19:20:11 +00:00
ccurme
9477f49409
openai, deepseek: make _convert_chunk_to_generation_chunk an instance method (#29731)
1. Make `_convert_chunk_to_generation_chunk` an instance method on
BaseChatOpenAI
2. Override on ChatDeepSeek to add `"reasoning_content"` to message
additional_kwargs.

Resolves https://github.com/langchain-ai/langchain/issues/29513
2025-02-11 11:13:23 -08:00
ccurme
d0c2dc06d5
mongodb[patch]: fix link in readme (#29738) 2025-02-11 18:19:59 +00:00
zzaebok
3b3d52206f
community: change wikidata rest api version from v0 to v1 (#29708)
**Description:**

According to the [wikidata
documentation](https://www.wikidata.org/wiki/Wikidata_talk:REST_API),
Wikibase REST API version 1 (stable) is released from November 11, 2024.
Their guide is to use the new v1 API and, it just requires replacing v0
in the routes with v1 in almost all cases.
So I replaced WIKIDATA_REST_API_URL from v0 to v1 for stable usage.

Co-authored-by: ccurme <chester.curme@gmail.com>
2025-02-10 17:12:38 -08:00
ccurme
4a389ef4c6
community: fix extended testing (#29715)
v0.3.100 of premai sdk appears to break on import:
89d9276cbf/premai/api/__init__.py (L230)
2025-02-10 16:57:34 -08:00
Bhav Sardana
624216aa64
community:Fix for Pydantic model validator of GoogleApiYoutubeLoader (#29694)
- **Description:** Community: bugfix for pedantic model validator for
GoogleApiYoutubeLoader
- **Issue:** #29165, #27432 
Fix is similar to #29346
2025-02-10 08:57:58 -05:00
Changyong Um
60740c44c5
community: Add configurable text key for indexing and the retriever in Pinecone Hybrid Search (#29697)
**issue**

In Langchain, the original content is generally stored under the `text`
key. However, the `PineconeHybridSearchRetriever` searches the `context`
field in the metadata and cannot change this key. To address this, I
have modified the code to allow changing the key to something other than
context.

In my opinion, following Langchain's conventions, the `text` key seems
more appropriate than `context`. However, since I wasn't sure about the
author's intent, I have left the default value as `context`.
2025-02-10 08:56:37 -05:00
manukychen
3de445d521
using getattr and default value to prevent 'OpenSearchVectorSearch' has no attribute 'bulk_size' (#29682)
- Description: Adding getattr methods and set default value 500 to
cls.bulk_size, it can prevent the error below:
Error: type object 'OpenSearchVectorSearch' has no attribute 'bulk_size'

- Issue: https://github.com/langchain-ai/langchain/issues/29071
2025-02-08 14:39:57 -05:00
Yao Tianjia
5d581ba22c
langchain: support the situation when action_input is null in json output_parser (#29680)
Description:
This PR fixes handling of null action_input in
[langchain.agents.output_parser]. Previously, passing null to
action_input could cause OutputParserException with unclear error
message which cause LLM don't know how to modify the action. The changes
include:

Added null-check validation before processing action_input
Implemented proper fallback behavior with default values
Maintained backward compatibility with existing implementations

Error Examples:
```
{
  "action":"some action",
  "action_input":null
}
```

Issue:
None

Dependencies:
None
2025-02-07 22:01:01 -05:00
Philippe PRADOS
beb75b2150
community[minor]: 05 - Refactoring PyPDFium2 parser (#29625)
This is one part of a larger Pull Request (PR) that is too large to be
submitted all at once. This specific part focuses on updating the
PyPDFium2 parser.

For more details, see
https://github.com/langchain-ai/langchain/pull/28970.
2025-02-07 21:31:12 -05:00
Christophe Bornet
723031d548
community: Bump ruff version to 0.9 (#29206)
Co-authored-by: Erick Friis <erick@langchain.dev>
2025-02-08 01:21:10 +00:00
Christophe Bornet
30f6c9f5c8
community: Use Blockbuster to detect blocking calls in asyncio during tests (#29609)
Same as https://github.com/langchain-ai/langchain/pull/29043 for
langchain-community.

**Dependencies:**
- blockbuster (test)

**Twitter handle:** cbornet_

Co-authored-by: Erick Friis <erick@langchain.dev>
2025-02-08 01:10:39 +00:00
Christophe Bornet
3a57a28daa
langchain: Use Blockbuster to detect blocking calls in asyncio during tests (#29616)
Same as https://github.com/langchain-ai/langchain/pull/29043 for the
langchain package.

**Dependencies:**
- blockbuster (test)

**Twitter handle:** cbornet_

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2025-02-08 01:08:15 +00:00
Keenan Pepper
c67d473397
core: Make abatch_as_completed respect max_concurrency (#29426)
- **Description:** Add tests for respecting max_concurrency and
implement it for abatch_as_completed so that test passes
- **Issue:** #29425
- **Dependencies:** none
- **Twitter handle:** keenanpepper
2025-02-07 16:51:22 -08:00
Aaron V
dcfaae85d2
Core: Fix __add__ for concatting two BaseMessageChunk's (#29531)
Description:

The change allows you to use the overloaded `+` operator correctly when
`+`ing two BaseMessageChunk subclasses. Without this you *must*
instantiate a subclass for it to work.

Which feels... wrong. Base classes should be decoupled from sub classes
and should have in no way a dependency on them.

Issue:

You can't `+` a BaseMessageChunk with a BaseMessageChunk

e.g. this will explode

```py
from langchain_core.outputs import (
    ChatGenerationChunk,
)
from langchain_core.messages import BaseMessageChunk


chunk1 = ChatGenerationChunk(
    message=BaseMessageChunk(
        type="customChunk",
        content="HI",
    ),
)

chunk2 = ChatGenerationChunk(
    message=BaseMessageChunk(
        type="customChunk",
        content="HI",
    ),
)

# this will throw
new_chunk = chunk1 + chunk2
```

In case anyone ran into this issue themselves, it's probably best to use
the AIMessageChunk:

a la 

```py
from langchain_core.outputs import (
    ChatGenerationChunk,
)
from langchain_core.messages import AIMessageChunk


chunk1 = ChatGenerationChunk(
    message=AIMessageChunk(
        content="HI",
    ),
)

chunk2 = ChatGenerationChunk(
    message=AIMessageChunk(
        content="HI",
    ),
)

# No explosion!
new_chunk = chunk1 + chunk2
```

Dependencies:

None!

Twitter handle: 
`aaron_vogler`

Keeping these for later if need be:
```
baskaryan
efriis 
eyurtsev
ccurme 
vbarda
hwchase17
baskaryan
efriis
```

Co-authored-by: Erick Friis <erick@langchain.dev>
2025-02-08 00:43:36 +00:00
Marlene
4fa3ef0d55
Community/Partner: Adding Azure community and partner user agent to better track usage in Python (#29561)
- This pull request includes various changes to add a `user_agent`
parameter to Azure OpenAI, Azure Search and Whisper in the Community and
Partner packages. This helps in identifying the source of API requests
so we can better track usage and help support the community better. I
will also be adding the user_agent to the new `langchain-azure` repo as
well.

- No issue connected or  updated dependencies. 
- Utilises existing tests and docs

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2025-02-07 23:28:30 +00:00
Ella Charlaix
c401254770
huggingface: Add ipex support to HuggingFaceEmbeddings (#29386)
ONNX and OpenVINO models are available by specifying the `backend`
argument (the model is loaded using `optimum`
https://github.com/huggingface/optimum)

```python
from langchain_huggingface import HuggingFaceEmbeddings

embedding = HuggingFaceEmbeddings(
    model_name=model_id,
    model_kwargs={"backend": "onnx"},
)
```

With this PR we also enable the IPEX backend 



```python
from langchain_huggingface import HuggingFaceEmbeddings

embedding = HuggingFaceEmbeddings(
    model_name=model_id,
    model_kwargs={"backend": "ipex"},
)
```
2025-02-07 15:21:09 -08:00
Bruno Alvisio
3eaf561561
core: Handle unterminated escape character when parsing partial JSON (#29065)
**Description**
Currently, when parsing a partial JSON, if a string ends with the escape
character, the whole key/value is removed. For example:

```
>>> from langchain_core.utils.json import parse_partial_json
>>> my_str = '{"foo": "bar", "baz": "qux\\'
>>> 
>>> parse_partial_json(my_str)
{'foo': 'bar'}
```

My expectation (and with this fix) would be for `parse_partial_json()`
to return:
```
>>> from langchain_core.utils.json import parse_partial_json
>>> 
>>> my_str = '{"foo": "bar", "baz": "qux\\'
>>> parse_partial_json(my_str)
{'foo': 'bar', 'baz': 'qux'}
```

Notes:
1. It could be argued that current behavior is still desired.
2. I have experienced this issue when the streaming output from an LLM
and the chunk happens to end with `\\`
3. I haven't included tests. Will do if change is accepted.
4. This is specially troublesome when this function is used by

187131c55c/libs/core/langchain_core/output_parsers/transform.py (L111)

since what happens is that, for example, if the received sequence of
chunks are: `{"foo": "b` , `ar\\` :

Then, the result of calling `self.parse_result()` is:
```
{"foo": "b"}
```
and the second time:
```
{}
```

Co-authored-by: Erick Friis <erick@langchain.dev>
2025-02-07 23:18:21 +00:00
Viren
252cf0af10
docs: add LangFair as a provider (#29390)
**Description:**
- Add `docs/docs/providers/langfair.mdx`
- Register langfair in `libs/packages.yml`

**Twitter handle:** @LangFair

**Tests and docs**
1. Integration tests not needed as this PR only adds a .mdx file to
docs.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
Co-authored-by: Dylan Bouchard <dylan.bouchard@cvshealth.com>
Co-authored-by: Dylan Bouchard <109233938+dylanbouchard@users.noreply.github.com>
Co-authored-by: Erick Friis <erickfriis@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2025-02-07 21:27:37 +00:00
Erick Friis
eb9eddae0c
docs: use init_chat_model (#29623) 2025-02-07 12:39:27 -08:00
ccurme
bff25b552c
community: release 0.3.17 (#29676) 2025-02-07 19:41:44 +00:00
ccurme
01314c51fa
langchain: release 0.3.18 (#29654) 2025-02-07 13:40:26 -05:00
ccurme
92e2239414
openai[patch]: make parallel_tool_calls explicit kwarg of bind_tools (#29669)
Improves discoverability and documentation.

cc @vbarda
2025-02-07 13:34:32 -05:00
Marc Ammann
5690575f13
openai: Removed tool_calls from completion chunk after other chunks have already been sent. (#29649)
- **Description:** Before sending a completion chunk at the end of an
OpenAI stream, removing the tool_calls as those have already been sent
as chunks.
- **Issue:** -
- **Dependencies:** -
- **Twitter handle:** -

@ccurme as mentioned in another PR

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-02-07 10:15:52 -05:00
Ikko Eltociear Ashimine
0d45ad57c1
community: update base_o365.py (#29657)
extention -> extension
2025-02-07 08:43:29 -05:00
Vincent Emonet
3645181d0e
qdrant: Add similarity_search_with_score_by_vector() function to the QdrantVectorStore (#29641)
Added `similarity_search_with_score_by_vector()` function to the
`QdrantVectorStore` class.

It is required when we want to query multiple time with the same
embeddings. It was present in the now deprecated original `Qdrant`
vectorstore implementation, but was absent from the new one. It is also
implemented in a number of others `VectorStore` implementations

I have added tests for this new function

Note that I also argued in this discussion that it should be part of the
general `VectorStore`
https://github.com/langchain-ai/langchain/discussions/29638

Co-authored-by: Erick Friis <erick@langchain.dev>
2025-02-07 00:55:58 +00:00
ccurme
488cb4a739
anthropic: release 0.3.7 (#29653) 2025-02-06 17:05:57 -05:00
ccurme
ab09490c20
openai: release 0.3.4 (#29652) 2025-02-06 17:02:21 -05:00
ccurme
29a0c38cc3
openai[patch]: add test for message.name (#29651) 2025-02-06 16:49:28 -05:00
ccurme
91cca827c0
tests: release 0.3.11 (#29648) 2025-02-06 21:48:09 +00:00
Sunish Sheth
25ce1e211a
docs: Updating the imports for langchain-databricks to databricks-langchain (#29646)
Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2025-02-06 13:28:07 -08:00
ccurme
e1b593ae77
text-splitters[patch]: release 0.3.6 (#29647) 2025-02-06 16:16:05 -05:00
ccurme
a91e58bc10
core: release 0.3.34 (#29644) 2025-02-06 15:53:56 -05:00
Vincent Emonet
08b9eaaa6f
community: improve FastEmbedEmbeddings support for ONNX execution provider (e.g. GPU) (#29645)
I made a change to how was implemented the support for GPU in
`FastEmbedEmbeddings` to be more consistent with the existing
implementation `langchain-qdrant` sparse embeddings implementation

It is directly enabling to provide the list of ONNX execution providers:
https://github.com/langchain-ai/langchain/blob/master/libs/partners/qdrant/langchain_qdrant/fastembed_sparse.py#L15

It is a bit less clear to a user that just wants to enable GPU, but
gives more capabilities to work with other execution providers that are
not the `CUDAExecutionProvider`, and is more future proof

Sorry for the disturbance @ccurme

> Nice to see you just moved to `uv`! It is so much nicer to run
format/lint/test! No need to manually rerun the `poetry install` with
all required extras now
2025-02-06 15:31:23 -05:00
ccurme
3450bfc806
infra: add UV_FROZEN to makefiles (#29642)
These are set in Github workflows, but forgot to add them to most
makefiles for convenience when developing locally.

`uv run` will automatically sync the lock file. Because many of our
development dependencies are local installs, it will pick up version
changes and update the lock file. Passing `--frozen` or setting this
environment variable disables the behavior.
2025-02-06 14:36:54 -05:00
ccurme
d172984c91
infra: migrate to uv (#29566) 2025-02-06 13:36:26 -05:00
ccurme
9da06e6e94
standard-tests[patch]: use has_structured_output property to engage structured output tests (#29635)
Motivation: dedicated structured output features are becoming more
common, such that integrations can support structured output without
supporting tool calling.

Here we make two changes:

1. Update the `has_structured_output` method to default to True if a
model supports tool calling (in addition to defaulting to True if
`with_structured_output` is overridden).
2. Update structured output tests to engage if `has_structured_output`
is True.
2025-02-06 10:09:06 -08:00
Vincent Emonet
db8201d4da
community: fix typo in the module imported when using GPU with FastEmbedEmbeddings (#29631)
Made a mistake in the module to import (the module stay the same only
the installed package changes), fixed it and tested it

https://github.com/langchain-ai/langchain/pull/29627

@ccurme
2025-02-06 10:26:08 -05:00
Mohammed Abbadi
f8fd65dea2
community: Update deeplake.py (#29633)
Deep Lake recently released version 4, which introduces significant
architectural changes, including a new on-disk storage format, enhanced
indexing mechanisms, and improved concurrency. However, LangChain's
vector store integration currently does not support Deep Lake v4 due to
breaking API changes.

Previously, the installation command was:
`pip install deeplake[enterprise]`
This installs the latest available version, which now defaults to Deep
Lake v4. Since LangChain's vector store integration is still dependent
on v3, this can lead to compatibility issues when using Deep Lake as a
vector database within LangChain.

To ensure compatibility, the installation command has been updated to:
`pip install deeplake[enterprise]<4.0.0`
This constraint ensures that pip installs the latest available version
of Deep Lake within the v3 series while avoiding the incompatible v4
update.
2025-02-06 10:25:13 -05:00
Vincent Emonet
0ac5536f04
community: add support for using GPUs with FastEmbedEmbeddings (#29627)
- **Description:** add a `gpu: bool = False` field to the
`FastEmbedEmbeddings` class which enables to use GPU (through ONNX CUDA
provider) when generating embeddings with any fastembed model. It just
requires the user to install a different dependency and we use a
different provider when instantiating `fastembed.TextEmbedding`
- **Issue:** when generating embeddings for a really large amount of
documents this drastically increase performance (honestly that is a must
have in some situations, you can't just use CPU it is way too slow)
- **Dependencies:** no direct change to dependencies, but internally the
users will need to install `fastembed-gpu` instead of `fastembed`, I
made all the changes to the init function to properly let the user know
which dependency they should install depending on if they enabled `gpu`
or not
 
cf. fastembed docs about GPU for more details:
https://qdrant.github.io/fastembed/examples/FastEmbed_GPU/

I did not added test because it would require access to a GPU in the
testing environment
2025-02-06 08:04:19 -05:00
Dmitrii Rashchenko
0ceda557aa
add o1 and o3-mini to pricing (#29628)
### PR Title:  
**community: add latest OpenAI models pricing**  

### Description:  
This PR updates the OpenAI model cost calculation mapping by adding the
latest OpenAI models, **o1 (non-preview)** and **o3-mini**, based on the
pricing listed on the [OpenAI pricing
page](https://platform.openai.com/docs/pricing).

### Changes:  
- Added pricing for `o1`, `o1-2024-12-17`, `o1-cached`, and
`o1-2024-12-17-cached` for input tokens.
- Added pricing for `o1-completion` and `o1-2024-12-17-completion` for
output tokens.
- Added pricing for `o3-mini`, `o3-mini-2025-01-31`, `o3-mini-cached`,
and `o3-mini-2025-01-31-cached` for input tokens.
- Added pricing for `o3-mini-completion` and
`o3-mini-2025-01-31-completion` for output tokens.

### Issue:  
N/A  

### Dependencies:  
None  

### Testing & Validation:  
- No functional changes outside of updating the cost mapping.  
- No tests were added or modified.
2025-02-06 08:02:20 -05:00
ZhangShenao
ac53977dbc
[MistralAI] Improve MistralAIEmbeddings (#29242)
- Add static method decorator for method.
- Add expected exception for retry decorator

#29125
2025-02-05 21:31:54 -05:00
Andrew Wason
22aa5e07ed
standard-tests: Fix ToolsIntegrationTests to correctly handle "content_and_artifact" tools (#29391)
**Description:**

The response from `tool.invoke()` is always a ToolMessage, with content
and artifact fields, not a tuple.
The tuple is converted to a ToolMessage here

b6ae7ca91d/libs/core/langchain_core/tools/base.py (L726)

**Issue:**

Currently `ToolsIntegrationTests` requires `invoke()` to return a tuple
and so standard tests fail for "content_and_artifact" tools. This fixes
that to check the returned ToolMessage.

This PR also adds a test that now passes.
2025-02-05 21:27:09 -05:00
Mohammad Anash
f849305a56
fixed Bug in PreFilter of AzureCosmosDBNoSqlVectorSearch (#29613)
Description: Fixes PreFilter value handling in Azure Cosmos DB NoSQL
vectorstore. The current implementation fails to handle numeric values
in filter conditions, causing an undefined value variable error. This PR
adds support for numeric, boolean, and NULL values while maintaining the
existing string and list handling.

Changes:
Added handling for numeric types (int/float)
Added boolean value support
Added NULL value handling
Added type validation for unsupported values
Fixed scope of value variable initialization

Issue: 
Fixes #29610

Implementation Notes:
No changes to public API
Backwards compatible
Maintains consistent behavior with existing MongoDB-style filtering
Preserves SQL injection prevention through proper value handling

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-02-06 02:20:26 +00:00
Philippe PRADOS
6ff0d5c807
community[minor]: 04 - Refactoring PDFMiner parser (#29526)
This is one part of a larger Pull Request (PR) that is too large to be
submitted all at once. This specific part focuses on updating the XXX
parser.

For more details, see [PR
28970](https://github.com/langchain-ai/langchain/pull/28970).

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2025-02-05 21:08:27 -05:00
Isaac Francisco
91ffd7caad
core: allow passing message dicts into ChatPromptTemplate (#29363)
Co-authored-by: Erick Friis <erick@langchain.dev>
2025-02-05 09:45:52 -08:00
ccurme
69595b0914
docs: fix builds (#29607)
Failing with:
> ValueError: Provider page not found for databricks-langchain. Please
add one at docs/integrations/providers/databricks-langchain.{mdx,ipynb}
2025-02-05 14:24:53 +00:00
ccurme
91a33a9211
anthropic[patch]: release 0.3.6 (#29606) 2025-02-05 14:18:02 +00:00
ccurme
5cbe6aba8f
anthropic[patch]: support citations in streaming (#29591) 2025-02-05 09:12:07 -05:00
William FH
5ae4ed791d
Drop duplicate inputs (#29589) 2025-02-04 18:06:10 -08:00
Erick Friis
65f0deb81a
packages: databricks-langchain (#29593) 2025-02-05 01:53:34 +00:00
Yoav Levy
621bba7e26
docs: add nimble as a provider (#29579)
## Description:

- Add docs/docs/providers/nimbleway.ipynb
- Add docs/docs/integrations/retrievers/nimbleway.ipynb
- Register nimbleway in libs/packages.yml

- X (twitter) handle: @urielkn / @LevyNorbit8
2025-02-04 16:47:03 -08:00
Erick Friis
50d61eafa2
partners/deepseek: release 0.1.1 (#29592) 2025-02-04 23:46:38 +00:00
Erick Friis
7edfcbb090
docs: rename to langchain-deepseek in docs (#29587) 2025-02-04 14:22:17 -08:00
Erick Friis
df8fa882b2
deepseek: bump core (#29584) 2025-02-04 10:25:46 -08:00
Erick Friis
455f65947a
deepseek: rename to langchain-deepseek from langchain-deepseek-official (#29583) 2025-02-04 17:57:25 +00:00
Philippe PRADOS
5771e561fb
[Bugfix langchain_community] Fix PyMuPDFLoader (#29550)
- **Description:**  add legacy properties
    - **Issue:** #29470
    - **Twitter handle:** pprados
2025-02-04 09:24:40 -05:00
Ashutosh Kumar
65b404a2d1
[oci_generative_ai] Option to pass auth_file_location (#29481)
**PR title**: "community: Option to pass auth_file_location for
oci_generative_ai"

**Description:** Option to pass auth_file_location, to overwrite config
file default location "~/.oci/config" where profile name configs
present. This is not fixing any issues. Just added optional parameter
called "auth_file_location", which internally supported by any OCI
client including GenerativeAiInferenceClient.
2025-02-03 21:44:13 -05:00
Teruaki Ishizaki
aeb42dc900
partners: Fixed the procedure of initializing pad_token_id (#29500)
- **Description:** Add to check pad_token_id and eos_token_id of model
config. It seems that this is the same bug as the HuggingFace TGI bug.
It's same bug as #29434
- **Issue:** #29431
- **Dependencies:** none
- **Twitter handle:** tell14

Example code is followings:
```python
from langchain_huggingface.llms import HuggingFacePipeline

hf = HuggingFacePipeline.from_model_id(
    model_id="meta-llama/Llama-3.2-3B-Instruct",
    task="text-generation",
    pipeline_kwargs={"max_new_tokens": 10},
)

from langchain_core.prompts import PromptTemplate

template = """Question: {question}

Answer: Let's think step by step."""
prompt = PromptTemplate.from_template(template)

chain = prompt | hf

question = "What is electroencephalography?"

print(chain.invoke({"question": question}))
```
2025-02-03 21:40:33 -05:00
AmirPoursaberi
a6efd22ba1
Fix a tiny typo in create_retrieval_chain docstring (#29552)
Hi there!

To fix a tiny typo in `create_retrieval_chain` docstring.
2025-02-03 10:54:49 -05:00
Hemant Rawat
db1693aa70
community: fix issue #29429 in age_graph.py (#29506)
## Description:

This PR addresses issue #29429 by fixing the _wrap_query method in
langchain_community/graphs/age_graph.py. The method now correctly
handles Cypher queries with UNION and EXCEPT operators, ensuring that
the fields in the SQL query are ordered as they appear in the Cypher
query. Additionally, the method now properly handles cases where RETURN
* is not supported.

### Issue: #29429

### Dependencies: None


### Add tests and docs:

Added unit tests in tests/unit_tests/graphs/test_age_graph.py to
validate the changes.
No new integrations were added, so no example notebook is necessary.
Lint and test:

Ran make format, make lint, and make test to ensure code quality and
functionality.
2025-02-01 21:24:45 -05:00
Keenan Pepper
2f97916dea
docs: Add goodfire notebook and add to packages.yml (#29512)
- **Description:** Add Goodfire ipynb notebook and add
langchain-goodfire package to packages.yml
- **Issue:** n/a
- **Dependencies:** docs only
- **Twitter handle:** keenanpepper

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-02-01 19:43:20 -05:00
ccurme
a3c5e4d070
deepseek[patch]: bump langchain-openai and add to scheduled testing (#29535) 2025-02-01 18:40:59 -05:00
ccurme
16a422f3fa
community: add standard tests for Perplexity (#29534) 2025-02-01 17:02:57 -05:00
Amit Ghadge
0c405245c4
[Integrations][Tool] Added Jenkins tools support (#29516)
Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"


- [x] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-31 12:50:10 -05:00
Christophe Bornet
aab2e42169
core[patch]: Use Blockbuster to detect blocking calls in asyncio during tests (#29043)
This PR uses the [blockbuster](https://github.com/cbornet/blockbuster)
library in langchain-core to detect blocking calls made in the asyncio
event loop during unit tests.
Avoiding blocking calls is hard as these can be deeply buried in the
code or made in 3rd party libraries.
Blockbuster makes it easier to detect them by raising an exception when
a call is made to a known blocking function (eg: `time.sleep`).

Adding blockbuster allowed to find a blocking call in
`aconfig_with_context` (it ends up calling `get_function_nonlocals`
which loads function code).

**Dependencies:**
- blockbuster (test)

**Twitter handle:** cbornet_
2025-01-31 10:06:34 -05:00
Philippe PRADOS
ceda8bc050
community[minor]: 03 - Refactoring PyPDF parser (#29330)
This is one part of a larger Pull Request (PR) that is too large to be
submitted all at once.
This specific part focuses on updating the PyPDF parser.

For more details, see [PR
28970](https://github.com/langchain-ai/langchain/pull/28970).
2025-01-31 10:05:07 -05:00
Julian Castro Pulgarin
b7e3e337b1
community: Fix YahooFinanceNewsTool to handle updated yfinance data structure (#29498)
*Description:**
Updates the YahooFinanceNewsTool to handle the current yfinance news
data structure. The tool was failing with a KeyError due to changes in
the yfinance API's response format. This PR updates the code to
correctly extract news URLs from the new structure.

**Issue:** #29495

**Dependencies:** 
No new dependencies required. Works with existing yfinance package.

The changes maintain backwards compatibility while fixing the KeyError
that users were experiencing.

The modified code properly handles the new data structure where:
- News type is now at `content.contentType`
- News URL is now at `content.canonicalUrl.url`

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-31 02:31:44 +00:00
Erick Friis
332e303858
partners/mistralai: release 0.2.6 (#29491) 2025-01-29 22:23:14 +00:00
Erick Friis
2c795f5628
partners/openai: release 0.3.3 (#29490) 2025-01-29 22:23:03 +00:00
Erick Friis
f307b3cc5f
langchain: release 0.3.17 (#29485) 2025-01-29 22:22:49 +00:00
Erick Friis
5cad3683b4
partners/groq: release 0.2.4 (#29488) 2025-01-29 22:22:30 +00:00
Erick Friis
e074c26a6b
partners/fireworks: release 0.2.7 (#29487) 2025-01-29 22:22:18 +00:00
Erick Friis
685609e1ef
partners/anthropic: release 0.3.5 (#29486) 2025-01-29 22:22:11 +00:00
Erick Friis
ed3a5e664c
standard-tests: release 0.3.10 (#29484) 2025-01-29 22:21:05 +00:00
Erick Friis
29461b36d9
partners/ollama: release 0.2.3 (#29489) 2025-01-29 22:19:44 +00:00
Erick Friis
07e2e80fe7
core: release 0.3.33 (#29483) 2025-01-29 14:11:53 -08:00
Erick Friis
8f95da4eb1
multiple: structured output tracing standard metadata (#29421)
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-29 14:00:26 -08:00
ccurme
284c935b08
tests[patch]: improve coverage of structured output tests (#29478) 2025-01-29 14:52:09 -05:00
Matheus Torquato
7aae738296
docs:Fix Imports for Document and BaseRetriever (#29473)
This pull request addresses an issue with import statements in the
langchain_core/retrievers.py file. The following changes have been made:

Corrected the import for Document from langchain_core.documents.base.
Corrected the import for BaseRetriever from langchain_core.retrievers.
These changes ensure that the SimpleRetriever class can correctly
reference the Document and BaseRetriever classes, improving code
reliability and maintainability.

---------

Co-authored-by: Matheus Torquato <mtorquat@jaguarlandrover.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-29 14:32:05 +00:00
Mohammad Anash
12bcc85927
added operator filter for supabase (#29475)
Description
This PR adds support for MongoDB-style $in operator filtering in the
Supabase vectorstore implementation. Currently, filtering with $in
operators returns no results, even when matching documents exist. This
change properly translates MongoDB-style filters to PostgreSQL syntax,
enabling efficient multi-document filtering.
Changes

Modified similarity_search_by_vector_with_relevance_scores to handle
MongoDB-style $in operators
Added automatic conversion of $in filters to PostgreSQL IN clauses
Preserved original vector type handling and numpy array conversion
Maintained compatibility with existing postgrest filters
Added support for the same filtering in
similarity_search_by_vector_returning_embeddings

Issue
Closes #27932

Implementation Notes
No changes to public API or function signatures
Backwards compatible - behavior unchanged for non-$in filters
More efficient than multiple individual queries for multi-ID searches
Preserves all existing functionality including numpy array conversion
for vector types

Dependencies
None

Additional Notes
The implementation handles proper SQL escaping for filter values
Maintains consistent behavior with other vectorstore implementations
that support MongoDB-style operators
Future extensions could support additional MongoDB-style operators ($gt,
$lt, etc.)

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-29 14:24:18 +00:00
ccurme
585f467d4a
mistral[patch]: release 0.2.5 (#29463) 2025-01-28 18:29:54 -05:00
ccurme
ca9d4e4595
mistralai: support method="json_schema" in structured output (#29461)
https://docs.mistral.ai/capabilities/structured-output/custom_structured_output/
2025-01-28 18:17:39 -05:00
Michael Chin
e120378695
community: Additional AWS deprecations (#29447)
Added deprecation warnings for a few more classes that weremoved to
`langchain-aws` package:
- [SageMaker Endpoint
LLM](https://python.langchain.com/api_reference/aws/retrievers/langchain_aws.retrievers.bedrock.AmazonKnowledgeBasesRetriever.html)
- [Amazon Kendra
retriever](https://python.langchain.com/api_reference/aws/retrievers/langchain_aws.retrievers.kendra.AmazonKendraRetriever.html)
- [Amazon Bedrock Knowledge Bases
retriever](https://python.langchain.com/api_reference/aws/retrievers/langchain_aws.retrievers.bedrock.AmazonKnowledgeBasesRetriever.html)
2025-01-28 09:50:14 -05:00
Erick Friis
2d776351af
community: release 0.3.16 (#29452) 2025-01-28 07:44:54 +00:00
Erick Friis
737a68fcdc
langchain: release 0.3.16 (#29451) 2025-01-28 07:31:09 +00:00
Erick Friis
8bf9c71673
core: release 0.3.32 (#29450) 2025-01-28 07:20:04 +00:00
Erick Friis
ecdc881328
langchain: add deepseek provider to init chat model (#29449) 2025-01-27 23:13:59 -08:00
Erick Friis
dced0ed3fd
deepseek, docs: chatdeepseek integration added (#29445) 2025-01-28 06:32:58 +00:00
Isaac Francisco
2bb2c9bfe8
change behavior for converting a string to openai messages (#29446) 2025-01-27 18:18:54 -08:00
ccurme
b1fdac726b
groq[patch]: update model used in test (#29441)
`llama-3.1-70b-versatile` was [shut
down](https://console.groq.com/docs/deprecations).
2025-01-27 21:11:44 +00:00
Adrián Panella
1551d9750c
community(doc_loaders): allow any credential type in AzureAIDocumentI… (#29289)
allow any credential type in AzureAIDocumentInteligence, not only
`api_key`.
This allows to use any of the credentials types integrated with AD.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-27 20:56:30 +00:00
ccurme
f00c66cc1f
chroma[patch]: release 0.2.1 (#29440) 2025-01-27 20:41:35 +00:00
Jorge Piedrahita Ortiz
3b886cdbb2
libs: add sambanova-lagchain integration package (#29417)
- **Description:**: Add sambanova-langchain integration package as
suggested in previous PRs

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-27 20:34:55 +00:00
Mohammad Anash
aba1fd0bd4
fixed similarity search with score error #29407 (#29413)
Description: Fix TypeError in AzureSearch similarity_search_with_score
by removing search_type from kwargs before passing to underlying
requests.

This resolves issue #29407 where search_type was being incorrectly
passed through to Session.request().
Issue: #29407

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-27 20:34:42 +00:00
itaismith
7b404fcd37
partners[chroma]: Upgrade Chroma to 0.6.x (#29404)
Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"


- [x] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2025-01-27 15:32:21 -05:00
Teruaki Ishizaki
3fce78994e
community: Fixed the procedure of initializing pad_token_id (#29434)
- **Description:** Add to check pad_token_id and eos_token_id of model
config. It seems that this is the same bug as the HuggingFace TGI bug.
In addition, the source code of
libs/partners/huggingface/langchain_huggingface/llms/huggingface_pipeline.py
also requires similar changes.
- **Issue:** #29431
- **Dependencies:** none
- **Twitter handle:** tell14
2025-01-27 14:54:54 -05:00
Christophe Bornet
dbb6b7b103
core: Add ruff rules TRY (tryceratops) (#29388)
TRY004 ("use TypeError rather than ValueError") existing errors are
marked as ignore to preserve backward compatibility.
LMK if you prefer to fix some of them.

Co-authored-by: Erick Friis <erick@langchain.dev>
2025-01-24 05:01:40 +00:00
Erick Friis
723b603f52
docs: groq api key links (#29402) 2025-01-24 04:33:18 +00:00
ccurme
bbc50f65e7
anthropic[patch]: release 0.3.4 (#29399) 2025-01-23 23:55:58 +00:00
ccurme
ed797e17fb
anthropic[patch]: always return content blocks if citations are generated (#29398)
We currently return string (and therefore no content blocks / citations)
if the response is of the form
```
[
    {"text": "a claim", "citations": [...]},
]
```

There are other cases where we do return citations as-is:
```
[
    {"text": "a claim", "citations": [...]},
    {"text": "some other text"},
    {"text": "another claim", "citations": [...]},
]
```
Here we update to return content blocks including citations in the first
case as well.
2025-01-23 18:47:23 -05:00
Bagatur
317fb86fd9
openai[patch]: fix int test (#29395) 2025-01-23 21:23:01 +00:00
Bagatur
8d566a8fe7
openai[patch]: detect old models in with_structured_output (#29392)
Co-authored-by: ccurme <chester.curme@gmail.com>
2025-01-23 20:47:32 +00:00
Christophe Bornet
b6ae7ca91d
core: Cache RunnableLambda __repr__ (#29199)
`RunnableLambda`'s `__repr__` may do costly OS operation by calling
`get_lambda_source`.
So it's better to cache it.
See #29043

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-23 18:34:47 +00:00
Christophe Bornet
618e550f06
core: Cache RunnableLambda deps (#29200)
`RunnableLambda`'s `deps` may do costly OS operation by calling
`get_function_nonlocals`.
So it's better to cache it.
See #29043
2025-01-23 13:09:07 -05:00
ccurme
f795ab99ec
docs: fix title rendered for integration package (#29387)
"Tilores LangchAIn" -> "Tilores"
2025-01-23 12:21:19 -05:00
Stefan Berkner
8977451c76
docs: add Tilores provider and tools (#29244)
Description: This PR adds documentation for the Tilores provider and
tools.
Issue: closes #26320
2025-01-23 12:17:59 -05:00
Ahmed Tammaa
d5b8aabb32
text-splitters[patch]: delete unused html_chunks_with_headers.xslt (#29340)
This pull request removes the now-unused html_chunks_with_headers.xslt
file from the codebase. In a previous update ([PR
#27678](https://github.com/langchain-ai/langchain/pull/27678)), the
HTMLHeaderTextSplitter class was refactored to utilize BeautifulSoup
instead of lxml and XSLT for HTML processing. As a result, the
html_chunks_with_headers.xslt file is no longer necessary and can be
safely deleted to maintain code cleanliness and reduce potential
confusion.

Issue: N/A

Dependencies: N/A
2025-01-23 11:29:08 -05:00
Wang Ran (汪然)
8f2c11e17b
core[patch]: fix API reference for draw_ascii (#29370)
typo: no `draw` but `draw_ascii` and other things

now, it works:
<img width="688" alt="image"
src="https://github.com/user-attachments/assets/5b5a8cc2-cf81-4a5c-b443-da0e4426556c"
/>

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-23 16:04:58 +00:00
Loris Alexandre
e4921239a6
community: missing mandatory parameter partition_key for AzureCosmosDBNoSqlVectorSearch (#29382)
- **Description:** the `delete` function of
AzureCosmosDBNoSqlVectorSearch is using
`self._container.delete_item(document_id)` which miss a mandatory
parameter `partition_key`
We use the class function `delete_document_by_id` to provide a default
`partition_key`
- **Issue:** #29372 
- **Dependencies:** None
- **Twitter handle:** None

Co-authored-by: Loris Alexandre <loris.alexandre@boursorama.fr>
2025-01-23 10:05:10 -05:00
Terry Tan
ec0ebb76f2
community: fix Google Scholar tool errors (#29371)
Resolve https://github.com/langchain-ai/langchain/issues/27557
2025-01-23 10:03:01 -05:00
江同学呀
a1e62070d0
community: Fix the problem of error reporting when OCR extracts text from PDF. (#29378)
- **Description:** The issue has been fixed where images could not be
recognized from ```xObject[obj]["/Filter"]``` (whose value can be either
a string or a list of strings) in the ```_extract_images_from_page()```
method. It also resolves the bug where vectorization by Faiss fails due
to the failure of image extraction from a PDF containing only
images```IndexError: list index out of range```.

![69a60f3f6bd474641b9126d74bb18f7e](https://github.com/user-attachments/assets/dc9e098d-2862-49f7-93b0-00f1056727dc)

- **Issue:** 
    Fix the following issues:
[#15227 ](https://github.com/langchain-ai/langchain/issues/15227)
[#22892 ](https://github.com/langchain-ai/langchain/issues/22892)
[#26652 ](https://github.com/langchain-ai/langchain/issues/26652)
[#27153 ](https://github.com/langchain-ai/langchain/issues/27153)
    Related issues:
[#7067 ](https://github.com/langchain-ai/langchain/issues/7067)

- **Dependencies:** None
- **Twitter handle:** None

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-23 15:01:52 +00:00
Tim Mallezie
a13faab6b7
community; allow to set gitlab url in gitlab tool in constrictor (#29380)
This pr, expands the gitlab url so it can also be set in a constructor,
instead of only through env variables.

This allows to do something like this. 
```
       # Create the GitLab API wrapper
        gitlab_api = GitLabAPIWrapper(
            gitlab_url=self.gitlab_url,
            gitlab_personal_access_token=self.gitlab_personal_access_token,
            gitlab_repository=self.gitlab_repository,
            gitlab_branch=self.gitlab_branch,
            gitlab_base_branch=self.gitlab_base_branch,
        )
```
Where before you could not set the url in the constructor.

Co-authored-by: Tim Mallezie <tim.mallezie@dropsolid.com>
2025-01-23 09:36:27 -05:00
Tyllen
f2ea62f632
docs: add payman docs (#29362)
- **Description:** Adding the docs to use the payman-langchain
integration :)
2025-01-22 18:37:47 -08:00
Erick Friis
3f1d20964a
standard-tests: release 0.3.9 (#29356) 2025-01-22 09:46:19 -08:00
Macs Dickinson
7378c955db
community: adds support for getting github releases for the configured repository (#29318)
**Description:** adds support for github tool to query github releases
on the configure respository
**Issue:** N/A
**Dependencies:** N/A
**Twitter handle:** @macsdickinson

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-22 15:45:52 +00:00
Tayaa Med Amine
ef1610e24a
langchain[patch]: support ollama in init_embeddings (#29349)
Why not Ollama ?

Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-22 14:47:12 +00:00
Siddhant
9eb10a9240
langchain: added vectorstore docstring linting (#29241)
…ore.py

Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"
  
Added docstring linting in the vectorstore.py file relating to issue
#25154


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Siddhant Jain <sjain35@buffalo.edu>
Co-authored-by: Erick Friis <erick@langchain.dev>
2025-01-22 03:47:43 +00:00
Sohan
de1fc4811d
packages, docs: Pipeshift - Langchain integration of pipeshift (#29114)
Description: Added pipeshift integration. This integrates pipeshift LLM
and ChatModels APIs with langchain
Dependencies: none

Unit Tests & Integration tests are added

Documentation is added as well

This PR is w.r.t
[#27390](https://github.com/langchain-ai/langchain/pull/27390) and as
per request, a freshly minted `langchain-pipeshift` package is uploaded
to PYPI. Only changes to the docs & packages.yml are made in langchain
master branch

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2025-01-22 03:03:06 +00:00
Christophe Bornet
836c791829
text-splitters: Bump ruff version to 0.9 (#29231)
Co-authored-by: Erick Friis <erick@langchain.dev>
2025-01-22 00:27:58 +00:00
Christophe Bornet
a004dec119
langchain: Bump ruff version to 0.9 (#29211)
Co-authored-by: Erick Friis <erick@langchain.dev>
2025-01-22 00:26:39 +00:00
Christophe Bornet
2340b3154d
standard-tests: Bump ruff version to 0.9 (#29230)
Co-authored-by: Erick Friis <erick@langchain.dev>
2025-01-22 00:23:01 +00:00
Christophe Bornet
e4a78dfc2a
core: Bump ruff version to 0.9 (#29201)
Also run some preview autofix and formatting

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2025-01-22 00:20:09 +00:00
Ella Charlaix
6f95db81b7
huggingface: Add IPEX models support (#29179)
Co-authored-by: Erick Friis <erick@langchain.dev>
2025-01-22 00:16:44 +00:00
Bhav Sardana
d6a7aaa97d
community: Fix for Pydantic model validator of GoogleApiClient (#29346)
- [ *] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** Fix for pedantic model validator for GoogleApiHandler
    - **Issue:** the issue #29165 

- [ *] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified.

---------

Signed-off-by: Bhav Sardana <sardana.bhav@gmail.com>
2025-01-21 15:17:43 -05:00
Christophe Bornet
1c4ce7b42b
core: Auto-fix some docstrings (#29337) 2025-01-21 13:29:53 -05:00
ccurme
86a0720310
fireworks[patch]: update model used in integration tests (#29342)
No access to firefunction-v1 and -v2.
2025-01-21 11:05:30 -05:00
Hugo Berg
32c9c58adf
Community: fix missing f-string modifier in oai structured output parsing error (#29326)
- **Description:** The ValueError raised on certain structured-outputs
parsing errors, in langchain openai community integration, was missing a
f-string modifier and so didn't produce useful outputs. This is a
2-line, 2-character change.
- **Issue:** None open that this fixes
- **Dependencies:** Nothing changed
- **Twitter handle:** None

- [X] **Add tests and docs**: There's nothing to add for.
- [-] **Lint and test**: Happy to run this if you deem it necessary.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-21 14:26:38 +00:00
Nuno Campos
566915d7cf
core: fix call to get closure vars for partial-wrapped funcs (#29316)
Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2025-01-21 09:26:15 -05:00
ZhangShenao
33e22ccb19
[Doc] Improve api doc (#29324)
- Fix doc description
- Add static method decorator
2025-01-21 09:16:08 -05:00
Bagatur
536b44a47f
community[patch]: Release 0.3.15 (#29325) 2025-01-21 03:10:07 +00:00
Bagatur
ec5fae76d4
langchain[patch]: Release 0.3.15 (#29322) 2025-01-21 02:24:11 +00:00
Bagatur
923e6fb321
core[patch]: 0.3.31 (#29320) 2025-01-21 01:17:31 +00:00
Ahmed Tammaa
d3ed9b86be
text-splitters[minor]: Replace lxml and XSLT with BeautifulSoup in HTMLHeaderTextSplitter for Improved Large HTML File Processing (#27678)
This pull request updates the `HTMLHeaderTextSplitter` by replacing the
`split_text_from_file` method's implementation. The original method used
`lxml` and XSLT for processing HTML files, which caused
`lxml.etree.xsltapplyerror maxhead` when handling large HTML documents
due to limitations in the XSLT processor. Fixes #13149

By switching to BeautifulSoup (`bs4`), we achieve:

- **Improved Performance and Reliability:** BeautifulSoup efficiently
processes large HTML files without the errors associated with `lxml` and
XSLT.
- **Simplified Dependencies:** Removes the dependency on `lxml` and
external XSLT files, relying instead on the widely used `beautifulsoup4`
library.
- **Maintained Functionality:** The new method replicates the original
behavior, ensuring compatibility with existing code and preserving the
extraction of content and metadata.

**Issue:**

This change addresses issues related to processing large HTML files with
the existing `HTMLHeaderTextSplitter` implementation. It resolves
problems where users encounter lxml.etree.xsltapplyerror maxhead due to
large HTML documents.

**Dependencies:**

- **BeautifulSoup (`beautifulsoup4`):** The `beautifulsoup4` library is
now used for parsing HTML content.
  - Installation: `pip install beautifulsoup4`

**Code Changes:**

Updated the `split_text_from_file` method in `HTMLHeaderTextSplitter` as
follows:

```python
def split_text_from_file(self, file: Any) -> List[Document]:
    """Split HTML file using BeautifulSoup.

    Args:
        file: HTML file path or file-like object.

    Returns:
        List of Document objects with page_content and metadata.
    """
    from bs4 import BeautifulSoup
    from langchain.docstore.document import Document
    import bs4

    # Read the HTML content from the file or file-like object
    if isinstance(file, str):
        with open(file, 'r', encoding='utf-8') as f:
            html_content = f.read()
    else:
        # Assuming file is a file-like object
        html_content = file.read()

    # Parse the HTML content using BeautifulSoup
    soup = BeautifulSoup(html_content, 'html.parser')

    # Extract the header tags and their corresponding metadata keys
    headers_to_split_on = [tag[0] for tag in self.headers_to_split_on]
    header_mapping = dict(self.headers_to_split_on)

    documents = []

    # Find the body of the document
    body = soup.body if soup.body else soup

    # Find all header tags in the order they appear
    all_headers = body.find_all(headers_to_split_on)

    # If there's content before the first header, collect it
    first_header = all_headers[0] if all_headers else None
    if first_header:
        pre_header_content = ''
        for elem in first_header.find_all_previous():
            if isinstance(elem, bs4.Tag):
                text = elem.get_text(separator=' ', strip=True)
                if text:
                    pre_header_content = text + ' ' + pre_header_content
        if pre_header_content.strip():
            documents.append(Document(
                page_content=pre_header_content.strip(),
                metadata={}  # No metadata since there's no header
            ))
    else:
        # If no headers are found, return the whole content
        full_text = body.get_text(separator=' ', strip=True)
        if full_text.strip():
            documents.append(Document(
                page_content=full_text.strip(),
                metadata={}
            ))
        return documents

    # Process each header and its associated content
    for header in all_headers:
        current_metadata = {}
        header_name = header.name
        header_text = header.get_text(separator=' ', strip=True)
        current_metadata[header_mapping[header_name]] = header_text

        # Collect all sibling elements until the next header of the same or higher level
        content_elements = []
        for sibling in header.find_next_siblings():
            if sibling.name in headers_to_split_on:
                # Stop at the next header
                break
            if isinstance(sibling, bs4.Tag):
                content_elements.append(sibling)

        # Get the text content of the collected elements
        current_content = ''
        for elem in content_elements:
            text = elem.get_text(separator=' ', strip=True)
            if text:
                current_content += text + ' '

        # Create a Document if there is content
        if current_content.strip():
            documents.append(Document(
                page_content=current_content.strip(),
                metadata=current_metadata.copy()
            ))
        else:
            # If there's no content, but we have metadata, still create a Document
            documents.append(Document(
                page_content='',
                metadata=current_metadata.copy()
            ))

    return documents
```

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2025-01-20 16:10:37 -05:00
Christophe Bornet
989eec4b7b
core: Add ruff rule S101 (no assert) (#29267)
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2025-01-20 20:24:31 +00:00
Christophe Bornet
e5d62c6ce7
core: Add ruff rule W293 (whitespaces) (#29272) 2025-01-20 15:16:12 -05:00
Philippe PRADOS
4efc5093c1
community[minor]: Refactoring PyMuPDF parser, loader and add image blob parsers (#29063)
* Adds BlobParsers for images. These implementations can take an image
and produce one or more documents per image. This interface can be used
for exposing OCR capabilities.
* Update PyMuPDFParser and Loader to standardize metadata, handle
images, improve table extraction etc.

- **Twitter handle:** pprados

This is one part of a larger Pull Request (PR) that is too large to be
submitted all at once.
This specific part focuses to prepare the update of all parsers.

For more details, see [PR
28970](https://github.com/langchain-ai/langchain/pull/28970).

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2025-01-20 15:15:43 -05:00
Syed Baqar Abbas
f175319303
[feat] Added backwards compatibility for OllamaEmbeddings initialization (migration from langchain_community.embeddings to langchain_ollama.embeddings (#29296)
- [feat] **Added backwards compatibility for OllamaEmbeddings
initialization (migration from `langchain_community.embeddings` to
`langchain_ollama.embeddings`**: "langchain_ollama"
- **Description:** Given that `OllamaEmbeddings` from
`langchain_community.embeddings` is deprecated, code is being shifted to
``langchain_ollama.embeddings`. However, this does not offer backward
compatibility of initializing the parameters and `OllamaEmbeddings`
object.
    - **Issue:** #29294 
    - **Dependencies:** None
    - **Twitter handle:** @BaqarAbbas2001


## Additional Information
Previously, `OllamaEmbeddings` from `langchain_community.embeddings`
used to support the following options:

e9abe583b2/libs/community/langchain_community/embeddings/ollama.py (L125-L139)

However, in the new package `from langchain_ollama import
OllamaEmbeddings`, there is no method to set these options. I have added
these parameters to resolve this issue.

This issue was also discussed in
https://github.com/langchain-ai/langchain/discussions/29113
2025-01-20 11:16:29 -05:00
CLOVA Studio 개발
7a95ffc775
community: fix some features on Naver ChatModel & embedding model 2 (#29243)
## Description
- Responding to `NCP API Key` changes.
- To fix `ChatClovaX` `astream` function to raise `SSEError` when an
error event occurs.
- To add `token length` and `ai_filter` to ChatClovaX's
`response_metadata`.
- To update document for apply NCP API Key changes.

cc. @efriis @vbarda
2025-01-20 11:01:03 -05:00
Sangyun_LEE
5d64597490
docs: fix broken Appearance of langchain_community/document_loaders/recursive_url_loader API Reference (#29305)
# PR mesesage
## Description
Fixed a broken Appearance of RecurisveUrlLoader API Reference.

### Before
<p align="center">
<img width="750" alt="image"
src="https://github.com/user-attachments/assets/f39df65d-b788-411d-88af-8bfa2607c00b"
/>
<img width="750" alt="image"
src="https://github.com/user-attachments/assets/b8a92b70-4548-4b4a-965f-026faeebd0ec"
/>
</p>

### After
<p align="center">
<img width="750" alt="image"
src="https://github.com/user-attachments/assets/8ea28146-de45-42e2-b346-3004ec4dfc55"
/>
<img width="750" alt="image"
src="https://github.com/user-attachments/assets/914c6966-4055-45d3-baeb-2d97eab06fe7"
/>
</p>

## Issue:
N/A
## Dependencies
None
## Twitter handle
N/A

# Add tests and docs
Not applicable; this change only affects documentation.

# Lint and test
Ran make format, make lint, and make test to ensure no issues.
2025-01-20 10:56:59 -05:00
Hemant Rawat
6c52378992
Add Google-style docstring linting and update pyproject.toml (#29303)
### Description:

This PR introduces Google-style docstring linting for the
ModelLaboratory class in libs/langchain/langchain/model_laboratory.py.
It also updates the pyproject.toml file to comply with the latest Ruff
configuration standards (deprecating top-level lint settings in favor of
lint).

### Changes include:
- [x] Added detailed Google-style docstrings to all methods in
ModelLaboratory.
- [x] Updated pyproject.toml to move select and pydocstyle settings
under the [tool.ruff.lint] section.
- [x] Ensured all files pass Ruff linting.

Issue:
Closes #25154

### Dependencies:
No additional dependencies are required for this change.

### Checklist
- [x] Files passes ruff linting.
- [x] Docstrings conform to the Google-style convention.
- [x] pyproject.toml updated to avoid deprecation warnings.
- [x] My PR is ready to review, please review.
2025-01-19 14:37:21 -05:00
Mohammad Mohtashim
b5fbebb3c8
(Community): Changing the BaseURL and Model for MiniMax (#29299)
- **Description:** Changed the Base Default Model and Base URL to
correct versions. Plus added a more explicit exception if user provides
an invalid API Key
- **Issue:** #29278
2025-01-19 14:15:02 -05:00
ccurme
c20f7418c7
openai[patch]: fix Azure LLM test (#29302)
The tokens I get are:
```
['', '\n\n', 'The', ' sun', ' was', ' setting', ' over', ' the', ' horizon', ',', ' casting', '']
```
so possibly an extra empty token is included in the output.

lmk @efriis if we should look into this further.
2025-01-19 17:25:42 +00:00
ccurme
6b249a0dc2
openai[patch]: release 0.3.1 (#29301) 2025-01-19 17:04:00 +00:00
ThomasSaulou
e9abe583b2
chatperplexity stream-citations in additional kwargs (#29273)
chatperplexity stream-citations in additional kwargs

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-18 22:31:10 +00:00
TheSongg
1cd4d8d101
[langchain_community.llms.xinference]: Rewrite _stream() method and support stream() method in xinference.py (#29259)
- [ ] **PR title**:[langchain_community.llms.xinference]: Rewrite
_stream() method and support stream() method in xinference.py

- [ ] **PR message**: Rewrite the _stream method so that the
chain.stream() can be used to return data streams.

       chain = prompt | llm
       chain.stream(input=user_input)


- [ ] **tests**: 
      from langchain_community.llms import Xinference
      from langchain.prompts import PromptTemplate

      llm = Xinference(
server_url="http://0.0.0.0:9997", # replace your xinference server url
model_uid={model_uid} # replace model_uid with the model UID return from
launching the model
          stream = True
       )
prompt = PromptTemplate(input=['country'], template="Q: where can we
visit in the capital of {country}? A:")
      chain = prompt | llm
      chain.stream(input={'country': 'France'})
2025-01-17 20:31:59 -05:00
ccurme
184ea8aeb2
anthropic[patch]: update tool choice type (#29276) 2025-01-17 15:26:33 -05:00
ccurme
ac52021097
anthropic[patch]: release 0.3.2 (#29275) 2025-01-17 19:48:31 +00:00
ccurme
c616b445f2
anthropic[patch]: support parallel_tool_calls (#29257)
Need to:
- Update docs
- Decide if this is an explicit kwarg of bind_tools
- Decide if this should be in standard test with flag for supporting
2025-01-17 19:41:41 +00:00
ccurme
d5360b9bd6
core[patch]: release 0.3.30 (#29256) 2025-01-16 17:52:37 -05:00
Nuno Campos
595297e2e5
core: Add support for calls in get_function_nonlocals (#29255)
Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2025-01-16 14:43:42 -08:00
Luis Lopez
75663f2cae
community: Add cost per 1K tokens for fine-tuned model cached input (#29248)
### Description

- Since there is no cost per 1k input tokens for a fine-tuned cached
version of `gpt-4o-mini-2024-07-18` is not available when using the
`OpenAICallbackHandler`, it raises an error when trying to make calls
with such model.
- To add the price in the `MODEL_COST_PER_1K_TOKENS` dictionary

cc. @efriis
2025-01-16 15:19:26 -05:00
Junon
667d2a57fd
add mode arg to OBSFileLoader.load() method (#29246)
- **Description:** add mode arg to OBSFileLoader.load() method
  - **Issue:** #29245
  - **Dependencies:** no dependencies required for this change

---------

Co-authored-by: Junon_Gz <junon_gz@qq.com>
2025-01-16 11:09:04 -05:00
Erick Friis
5eb4dc5e06
standard-tests: double messages test (#29237) 2025-01-15 15:14:29 -08:00
Nithish Raghunandanan
1051fa5729
couchbase: Migrate couchbase partner package to different repo (#29239)
**Description:** Migrate the couchbase partner package to
[Couchbase-Ecosystem](https://github.com/Couchbase-Ecosystem/langchain-couchbase)
org
2025-01-15 12:37:27 -08:00
Nadeem Sajjad
eaf2fb287f
community(pypdfloader): added page_label in metadata for pypdf loader (#29225)
# Description

## Summary
This PR adds support for handling multi-labeled page numbers in the
**PyPDFLoader**. Some PDFs use complex page numbering systems where the
actual content may begin after multiple introductory pages. The
page_label field helps accurately reflect the document’s page structure,
making it easier to handle such cases during document parsing.

## Motivation
This feature improves document parsing accuracy by allowing users to
access the actual page labels instead of relying only on the physical
page numbers. This is particularly useful for documents where the first
few pages have roman numerals or other non-standard page labels.

## Use Case
This feature is especially useful for **Retrieval-Augmented Generation**
(RAG) systems where users may reference page numbers when asking
questions. Some PDFs have both labeled page numbers (like roman numerals
for introductory sections) and index-based page numbers.

For example, a user might ask:

	"What is mentioned on page 5?"

The system can now check both:
	•	**Index-based page number** (page)
	•	**Labeled page number** (page_label)

This dual-check helps improve retrieval accuracy. Additionally, the
results can be validated with an **agent or tool** to ensure the
retrieved pages match the user’s query contextually.

## Code Changes

- Added a page_label field to the metadata of the Document class in
**PyPDFLoader**.
- Implemented support for retrieving page_label from the
pdf_reader.page_labels.
- Created a test case (test_pypdf_loader_with_multi_label_page_numbers)
with a sample PDF containing multi-labeled pages
(geotopo-komprimiert.pdf) [[Source of
pdf](https://github.com/py-pdf/sample-files/blob/main/009-pdflatex-geotopo/GeoTopo-komprimiert.pdf)].
- Updated existing tests to ensure compatibility and verify page_label
extraction.

## Tests Added

- Added a new test case for a PDF with multi-labeled pages.
- Verified both page and page_label metadata fields are correctly
extracted.

## Screenshots

<img width="549" alt="image"
src="https://github.com/user-attachments/assets/65db9f5c-032e-4592-926f-824777c28f33"
/>
2025-01-15 14:18:07 -05:00
Mehdi
1a38948ee3
Mehdi zare/fmp data doc (#29219)
Title: community: add Financial Modeling Prep (FMP) API integration

Description: Adding LangChain integration for Financial Modeling Prep
(FMP) API to enable semantic search and structured tool creation for
financial data endpoints. This integration provides semantic endpoint
search using vector stores and automatic tool creation with proper
typing and error handling. Users can discover relevant financial
endpoints using natural language queries and get properly typed
LangChain tools for discovered endpoints.

Issue: N/A

Dependencies:

fmp-data>=0.3.1
langchain-core>=0.1.0
faiss-cpu
tiktoken
Twitter handle: @mehdizarem

Unit tests and example notebook have been added:

Tests are in tests/integration_tests/est_tools.py and
tests/unit_tests/test_tools.py
Example notebook is in docs/tools.ipynb
All format, lint and test checks pass:

pytest
mypy .
Dependencies are imported within functions and not added to
pyproject.toml. The changes are backwards compatible and only affect the
community package.

---------

Co-authored-by: mehdizare <mehdizare@users.noreply.github.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-15 15:31:01 +00:00
Mohammad Mohtashim
288613d361
(text-splitters): Small Fix in _process_html for HTMLSemanticPreservingSplitter to properly extract the metadata. (#29215)
- **Description:** Include `main` in the list of elements whose child
elements needs to be processed for splitting the HTML.
- **Issue:** #29184
2025-01-15 10:18:06 -05:00
TheSongg
4867fe7ac8
[langchain_community.llms.xinference]: fix error in xinference.py (#29216)
- [ ] **PR title**: [langchain_community.llms.xinference]: fix error in
xinference.py

- [ ] **PR message**:
- The old code raised an ValidationError:
pydantic_core._pydantic_core.ValidationError: 1 validation error for
Xinference when import Xinference from xinference.py. This issue has
been resolved by adjusting it's type and default value.

File "/media/vdc/python/lib/python3.10/site-packages/pydantic/main.py",
line 212, in __init__
validated_self = self.__pydantic_validator__.validate_python(data,
self_instance=self)
pydantic_core._pydantic_core.ValidationError: 1 validation error for
Xinference
        client
Field required [type=missing, input_value={'server_url':
'http://10...t4', 'model_kwargs': {}}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.9/v/missing

- [ ] **tests**:

       from langchain_community.llms import Xinference
       llm = Xinference(
server_url="http://0.0.0.0:9997", # replace your xinference server url
model_uid={model_uid} # replace model_uid with the model UID return from
launching the model
         )
2025-01-15 10:11:26 -05:00
Syed Baqar Abbas
4278046329
[fix] Convert table names to list for compatibility in SQLDatabase (#29229)
- [langchain_community.utilities.SQLDatabase] **[fix] Convert table
names to list for compatibility in SQLDatabase**:
  - The issue #29227 is being fixed here
  - The "package" modified is community
  - The issue lied in this block of code:

44b41b699c/libs/community/langchain_community/utilities/sql_database.py (L72-L77)

- [langchain_community.utilities.SQLDatabase] **[fix] Convert table
names to list for compatibility in SQLDatabase**:
- **Description:** When the SQLDatabase is initialized, it runs a code
`self._inspector.get_table_names(schema=schema)` which expects an output
of list. However, with some connectors (such as snowflake) the data type
returned could be another iterable. This results in a type error when
concatenating the table_names to view_names. I have added explicit type
casting to prevent this.
    - **Issue:** The issue #29227 is being fixed here
    - **Dependencies:** None
    - **Twitter handle:** @BaqarAbbas2001

## Additional Information
When the following method is called for a Snowflake database:

44b41b699c/libs/community/langchain_community/utilities/sql_database.py (L75)

Snowflake under the hood calls:
```python
from snowflake.sqlalchemy.snowdialect import SnowflakeDialect
SnowflakeDialect.get_table_names
```

This method returns a `dict_keys()` object which is incompatible to
concatenate with a list and results in a `TypeError`

### Relevant Library Versions
- **snowflake-sqlalchemy**: 1.7.2  
- **snowflake-connector-python**: 3.12.4  
- **sqlalchemy**: 2.0.20  
- **langchain_community**: 0.3.14
2025-01-15 10:00:03 -05:00
Jin Hyung Ahn
05554265b4
community: Fix ConfluenceLoader load() failure caused by deleted pages (#29232)
## Description
This PR modifies the is_public_page function in ConfluenceLoader to
prevent exceptions caused by deleted pages during the execution of
ConfluenceLoader.process_pages().


**Example scenario:**
Consider the following usage of ConfluenceLoader:
```python
import os
from langchain_community.document_loaders import ConfluenceLoader

loader = ConfluenceLoader(
        url=os.getenv("BASE_URL"),
        token=os.getenv("TOKEN"),
        max_pages=1000,
        cql=f'type=page and lastmodified >= "2020-01-01 00:00"',
        include_restricted_content=False,
)

# Raised Exception : HTTPError: Outdated version/old_draft/trashed? Cannot find content Please provide valid ContentId.
documents = loader.load()
```

If a deleted page exists within the query result, the is_public_page
function would previously raise an exception when calling
get_all_restrictions_for_content, causing the loader.load() process to
fail for all pages.



By adding a pre-check for the page's "current" status, unnecessary API
calls to get_all_restrictions_for_content for non-current pages are
avoided.


This fix ensures that such pages are skipped without affecting the rest
of the loading process.





## Issue
N/A (No specific issue number)

## Dependencies
No new dependencies are introduced with this change.

## Twitter handle
[@zenoengine](https://x.com/zenoengine)
2025-01-15 09:56:23 -05:00
Mohammad Mohtashim
21eb39dff0
[Community]: AzureOpenAIWhisperParser Authenication Fix (#29135)
- **Description:** `AzureOpenAIWhisperParser` authentication fix as
stated in the issue.
- **Issue:** #29133
2025-01-15 09:44:53 -05:00
Erick Friis
b05543c69b
packages: disable mongodb for api docs (#29218) 2025-01-15 02:23:01 +00:00
Erick Friis
30badd7a32
packages: update mongodb folder (#29217) 2025-01-15 02:01:06 +00:00
pm390
76172511fd
community: Additional parameters for OpenAIAssistantV2Runnable (#29207)
**Description:** Added Additional parameters that could be useful for
usage of OpenAIAssistantV2Runnable.

This change is thought to allow langchain users to set parameters that
cannot be set using assistants UI
(max_completion_tokens,max_prompt_tokens,parallel_tool_calls) and
parameters that could be useful for experimenting like top_p and
temperature.

This PR originated from the need of using parallel_tool_calls in
langchain, this parameter is very important in openAI assistants because
without this parameter set to False strict mode is not respected by
OpenAI Assistants
(https://platform.openai.com/docs/guides/function-calling#parallel-function-calling).

> Note: Currently, if the model calls multiple functions in one turn
then strict mode will be disabled for those calls.

**Issue:** None
**Dependencies:** openai
2025-01-14 15:53:37 -05:00
Bagatur
4ab04ad6be
docs: oai api ref nit (#29210) 2025-01-14 17:55:16 +00:00
Michael Chin
d9b856abad
community: Deprecate Amazon Neptune resources in langchain-community (#29191)
Related: https://github.com/langchain-ai/langchain-aws/pull/322

The legacy `NeptuneOpenCypherQAChain` and `NeptuneSparqlQAChain` classes
are being replaced by the new LCEL format chains
`create_neptune_opencypher_qa_chain` and
`create_neptune_sparql_qa_chain`, respectively, in the `langchain_aws`
package.

This PR adds deprecation warnings to all Neptune classes and functions
that have been migrated to `langchain_aws`. All relevant documentation
has also been updated to replace `langchain_community` usage with the
new `langchain_aws` implementations.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-14 10:23:34 -05:00
Erick Friis
c55af44711
anthropic: pydantic mypy plugin (#29144) 2025-01-13 15:32:40 -08:00
ccurme
1bf6576709
cli[patch]: fix anchor links in templates (#29178)
These are outdated and can break docs builds.
2025-01-13 18:28:18 +00:00
Christopher Varjas
e156b372fb
langchain: support api key argument with OpenAI moderation chain (#29140)
**Description:** Makes it possible to instantiate
`OpenAIModerationChain` with an `openai_api_key` argument only and no
`OPENAI_API_KEY` environment variable defined.

**Issue:** https://github.com/langchain-ai/langchain/issues/25176

**Dependencies:** `openai`

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2025-01-13 11:00:02 -05:00
Nikhil Shahi
335ca3a606
docs: add HyperbrowserLoader docs (#29143)
### Description
This PR adds docs for the
[langchain-hyperbrowser](https://pypi.org/project/langchain-hyperbrowser/)
package. It includes a document loader that uses Hyperbrowser to scrape
or crawl any urls and return formatted markdown or html content as well
as relevant metadata.
[Hyperbrowser](https://hyperbrowser.ai) is a platform for running and
scaling headless browsers. It lets you launch and manage browser
sessions at scale and provides easy to use solutions for any webscraping
needs, such as scraping a single page or crawling an entire site.

### Issue
None

### Dependencies
None

### Twitter Handle
`@hyperbrowser`
2025-01-13 10:45:39 -05:00
Tymon Żarski
689592f9bb
community: Fix rank-llm import paths for new 0.20.3 version (#29154)
# **PR title**: "community: Fix rank-llm import paths for new 0.20.3
version"
- The "community" package is being modified to handle updated import
paths for the new `rank-llm` version.

---

## Description
This PR updates the import paths for the `rank-llm` package to account
for changes introduced in version `0.20.3`. The changes ensure
compatibility with both pre- and post-revamp versions of `rank-llm`,
specifically version `0.12.8`. Conditional imports are introduced based
on the detected version of `rank-llm` to handle different path
structures for `VicunaReranker`, `ZephyrReranker`, and `SafeOpenai`.

## Issue
RankLLMRerank usage throws an error when used GPT (not only) when
rank-llm version is > 0.12.8 - #29156

## Dependencies
This change relies on the `packaging` and `pkg_resources` libraries to
handle version checks.

## Twitter handle
@tymzar
2025-01-13 10:22:14 -05:00
Andrew
0e3115330d
Add additional_instructions on openai assistan runs create. (#29164)
- **Description**: In the functions `_create_run` and `_acreate_run`,
the parameters passed to the creation of
`openai.resources.beta.threads.runs` were limited.

  Source: 
  ```
  def _create_run(self, input: dict) -> Any:
        params = {
            k: v
            for k, v in input.items()
            if k in ("instructions", "model", "tools", "run_metadata")
        }
        return self.client.beta.threads.runs.create(
            input["thread_id"],
            assistant_id=self.assistant_id,
            **params,
        )
  ```
- OpenAI Documentation
([createRun](https://platform.openai.com/docs/api-reference/runs/createRun))

- Full list of parameters `openai.resources.beta.threads.runs` ([source
code](https://github.com/openai/openai-python/blob/main/src/openai/resources/beta/threads/runs/runs.py#L91))

 
- **Issue:** Fix #17574 



- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Co-authored-by: ccurme <chester.curme@gmail.com>
2025-01-13 10:11:47 -05:00
ccurme
e4ceafa1c8
langchain[patch]: update extended tests for compatibility with langchain-openai==0.3 (#29174) 2025-01-13 15:04:22 +00:00
Priyansh Agrawal
c115c09b6d
community: add missing format specifier in error log in CubeSemanticLoader (#29172)
Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"


- [x] **PR message**
- **Description:** Add a missing format specifier in an an error log in
`langchain_community.document_loaders.CubeSemanticLoader`
- **Issue:** raises `TypeError: not all arguments converted during
string formatting`


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2025-01-13 09:32:57 -05:00
ThomasSaulou
349b5c91c2
fix chatperplexity: remove 'stream' from params in _stream method (#29173)
quick fix chatperplexity: remove 'stream' from params in _stream method
2025-01-13 09:31:37 -05:00
LIU Yuwei
f980144e9c
community: add init for unstructured file loader (#29101)
## Description
Add `__init__` for unstructured loader of
epub/image/markdown/pdf/ppt/word to restrict the input type to `str` or
`Path`.
In the
[signature](https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.markdown.UnstructuredMarkdownLoader.html)
these unstructured loaders receive `file_path: str | List[str] | Path |
List[Path]`, but actually they only receive `str` or `Path`.

## Issue
None

## Dependencies
No changes.
2025-01-13 09:26:00 -05:00
Erick Friis
bbc3e3b2cf
openai: disable streaming for o1 by default (#29147)
Currently 400s
https://community.openai.com/t/streaming-support-for-o1-o1-2024-12-17-resulting-in-400-unsupported-value/1085043

o1-mini and o1-preview stream fine
2025-01-11 02:24:11 +00:00
Isaac Francisco
62074bac60
replace all LANGCHAIN_ flags with LANGSMITH_ flags (#29120) 2025-01-11 01:24:40 +00:00
Bagatur
5c2fbb5b86
docs: Update openai README.md (#29146) 2025-01-10 17:24:16 -08:00
Erick Friis
0a54aedb85
anthropic: pdf integration test (#29142) 2025-01-10 21:56:31 +00:00
ccurme
8de8519daf
tests[patch]: release 0.3.8 (#29141) 2025-01-10 21:53:41 +00:00
Jiang
7d3fb21807
Add lindorm as new integration (#29123)
Misoperation caused the pr close: [origin pr
link](https://github.com/langchain-ai/langchain/pull/29085)

---------

Co-authored-by: jiangzhijie <jiangzhijie.jzj@alibaba-inc.com>
2025-01-10 16:30:37 -05:00
ccurme
4819b500e8
pinecone[patch]: release 0.2.2 (#29139) 2025-01-10 14:59:57 -05:00
Ashvin
46fd09ffeb
partner: Update aiohttp in langchain pinecone. (#28863)
- **partner**: "Update Aiohttp for resolving vulnerability issue"
    
- **Description:** I have updated the upper limit of aiohttp from `3.10`
to `3.10.5` in the pyproject.toml file of langchain-pinecone. Hopefully
this will resolve #28771 . Please review this as I'm quite unsure.

---------

Co-authored-by: = <=>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-10 14:54:52 -05:00
ccurme
f3d370753f
xai[minor]: release 0.2 (#29132)
Update `langchain-openai` to 0.3. See [release
notes](https://github.com/langchain-ai/langchain/releases/tag/langchain-openai%3D%3D0.3.0)
for details. Should only impact default values of `temperature`, `n`,
and `max_retries`.
2025-01-10 11:47:27 -05:00
ccurme
6e63ccba84
openai[minor]: release 0.3 (#29100)
## Goal

Solve the following problems with `langchain-openai`:

- Structured output with `o1` [breaks out of the
box](https://langchain.slack.com/archives/C050X0VTN56/p1735232400232099).
- `with_structured_output` by default does not use OpenAI’s [structured
output
feature](https://platform.openai.com/docs/guides/structured-outputs).
- We override API defaults for temperature and other parameters.

## Breaking changes:

- Default method for structured output is changing to OpenAI’s dedicated
[structured output
feature](https://platform.openai.com/docs/guides/structured-outputs).
For schemas specified via TypedDict or JSON schema, strict schema
validation is disabled by default but can be enabled by specifying
`strict=True`.
- To recover previous default, pass `method="function_calling"` into
`with_structured_output`.
- Models that don’t support `method="json_schema"` (e.g., `gpt-4` and
`gpt-3.5-turbo`, currently the default model for ChatOpenAI) will raise
an error unless `method` is explicitly specified.
- To recover previous default, pass `method="function_calling"` into
`with_structured_output`.
- Schemas specified via Pydantic `BaseModel` that have fields with
non-null defaults or metadata (like min/max constraints) will raise an
error.
- To recover previous default, pass `method="function_calling"` into
`with_structured_output`.
- `strict` now defaults to False for `method="json_schema"` when schemas
are specified via TypedDict or JSON schema.
- To recover previous behavior, use `with_structured_output(schema,
strict=True)`
- Schemas specified via Pydantic V1 will raise a warning (and use
`method="function_calling"`) unless `method` is explicitly specified.
- To remove the warning, pass `method="function_calling"` into
`with_structured_output`.
- Streaming with default structured output method / Pydantic schema no
longer generates intermediate streamed chunks.
- To recover previous behavior, pass `method="function_calling"` into
`with_structured_output`.
- We no longer override default temperature (was 0.7 in LangChain, now
will follow OpenAI, currently 1.0).
- To recover previous behavior, initialize `ChatOpenAI` or
`AzureChatOpenAI` with `temperature=0.7`.
- Note: conceptually there is a difference between forcing a tool call
and forcing a response format. Tool calls may have more concise
arguments vs. generating content adhering to a schema. Prompts may need
to be adjusted to recover desired behavior.

---------

Co-authored-by: Jacob Lee <jacoblee93@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2025-01-10 10:50:32 -05:00
ccurme
815bfa1913
openai[patch]: support streaming with json_schema response format (#29044)
- Stream JSON string content. Final chunk includes parsed representation
(following OpenAI
[docs](https://platform.openai.com/docs/guides/structured-outputs#streaming)).
- Mildly (?) breaking change: if you were using streaming with
`response_format` before, usage metadata will disappear unless you set
`stream_usage=True`.

## Response format

Before:

![Screenshot 2025-01-06 at 11 59
01 AM](https://github.com/user-attachments/assets/e54753f7-47d5-421d-b8f3-172f32b3364d)


After:

![Screenshot 2025-01-06 at 11 58
13 AM](https://github.com/user-attachments/assets/34882c6c-2284-45b4-92f7-5b5b69896903)


## with_structured_output

For pydantic output, behavior of `with_structured_output` is unchanged
(except for warning disappearing), because we pluck the parsed
representation straight from OpenAI, and OpenAI doesn't return it until
the stream is completed. Open to alternatives (e.g., parsing from
content or intermediate dict chunks generated by OpenAI).

Before:

![Screenshot 2025-01-06 at 12 38
11 PM](https://github.com/user-attachments/assets/913d320d-f49e-4cbb-a800-b394ae817fd1)

After:

![Screenshot 2025-01-06 at 12 38
58 PM](https://github.com/user-attachments/assets/f7a45dd6-d886-48a6-8d76-d0e21ca767c6)
2025-01-09 10:32:30 -05:00
Panos Vagenas
858f655a25
docs: add Docling loader docs (#29104)
### Description
This adds the docs for the Docling document loader.
[Docling](https://github.com/DS4SD/docling) parses PDF, DOCX, PPTX,
HTML, and other formats into a rich unified representation including
document layout, tables etc., making them ready for generative AI
workflows like RAG.

Some references:
- https://research.ibm.com/blog/docling-generative-AI
-
https://www.redhat.com/en/blog/docling-missing-document-processing-companion-generative-ai
- [Docling Technical Report](https://arxiv.org/abs/2408.09869)

The introduced `DoclingLoader` enables users to:
- use various document types in their LLM applications with ease and
speed, and
- leverage Docling's rich representation for advanced, document-native
grounding.

### Issue
Replacing PR #27987 as discussed with @efriis
[here](https://github.com/langchain-ai/langchain/pull/27987#issuecomment-2489354930).

### Dependencies
None

---------

Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2025-01-09 10:15:35 -05:00
Joshua Campbell
00dcc44739
Langchain_community: Fix issue with missing backticks in arango client (#29110)
- **Description:** Adds backticks to generate_schema function in the
arango graph client
- **Issue:** We experienced an issue with the generate schema function
when talking to our arango database where these backticks were missing
    - **Dependencies:** none
    - **Twitter handle:** @anangelofgrace
2025-01-09 10:00:10 -05:00
LIU Yuwei
2b09f798e1
community: add init for UnstructuredHTMLLoader to solve pathlib paths (#29091)
## Description
Add `__init__` for `UnstructuredHTMLLoader` to restrict the input type
to `str` or `Path`, and transfer the `self.file_path` to `str` just like
`UnstructuredXMLLoader` does.

## Issue
Fix #29090 

## Dependencies
No changes.
2025-01-08 10:19:27 -05:00
Jin Hyung Ahn
c8ca1cd42f
community: fix "confluence-loader" enable include_labels for documents loaded via CQL (#29089)
## Description
This PR enables label inclusion for documents loaded via CQL in the
confluence-loader.

- Updated _lazy_load to pass the include_labels parameter instead of
False in process_pages calls for documents loaded via CQL.
- Ensured that labels can now be fetched and added to the metadata for
documents queried with cql.

## Related Modification History
This PR builds on the previous functionality introduced in
[#28259](https://github.com/langchain-ai/langchain/pull/28259), which
added support for including labels with the include_labels option.
However, this functionality did not work as expected for CQL queries,
and this PR fixes that issue.

If the False handling was intentional due to another issue, please let
me know. I have verified with our Confluence instance that this change
allows labels to be correctly fetched for documents loaded via CQL.

## Issue
Fixes #29088


## Dependencies
No changes.

## Twitter Handle
[@zenoengine](https://x.com/zenoengine)
2025-01-08 10:16:39 -05:00
Inah Jeon
9d290abccd
partner: Update Upstage Model Names and Remove Deprecated Model (#29093)
This PR updates model names in the upstage library to reflect the latest
naming conventions and removes deprecated models.

Changes:

Renamed Models:
- `solar-1-mini-chat` -> `solar-mini`
- `solar-1-mini-embedding-query` -> `embedding-query`

Removed Deprecated Models:
- `layout-analysis` (replaced to `document-parse`)

Reference:
- https://console.upstage.ai/docs/getting-started/overview
-
https://github.com/langchain-ai/langchain-upstage/releases/tag/libs%2Fupstage%2Fv0.5.0

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2025-01-08 10:13:22 -05:00
Prashanth Rao
b1dafaef9b
Kùzu package integration docs (#29076)
## Langchain Kùzu

### Description
 
This PR adds docs for the `langchain-kuzu` package [on
PyPI](https://pypi.org/project/langchain-kuzu/) that was recently
published, allowing Kùzu users to more easily use and work with
LangChain QA chains. The package will also make it easier for the Kùzu
team to continue supporting and updating the integration over future
releases.

### Twitter Handle

Please tag [@kuzudb](https://x.com/kuzudb) on Twitter once this PR is
merged, so LangChain users can be notified!

---------

Co-authored-by: Erick Friis <erickfriis@gmail.com>
2025-01-08 01:14:00 +00:00
Erick Friis
cc0f81f40f
partners/groq: release 0.2.3 (#29081) 2025-01-07 23:36:51 +00:00
Erick Friis
fcc9cdd100
multiple: disable socket for unit tests (#29080) 2025-01-07 15:31:50 -08:00
Erick Friis
539ebd5431
groq: user agent (#29079) 2025-01-07 23:21:57 +00:00
Erick Friis
c5bee0a544
pinecone: bump core version (#29077) 2025-01-07 20:23:33 +00:00
Cory Waddingham
ce9e9f9314
pinecone: Review pinecone tests (#29073)
Title: langchain-pinecone: improve test structure and async handling

Description: This PR improves the test infrastructure for the
langchain-pinecone package by:
1. Implementing LangChain's standard test patterns for embeddings
2. Adding comprehensive configuration testing
3. Improving async test coverage
4. Fixing integration test issues with namespaces and async markers

The changes make the tests more robust, maintainable, and aligned with
LangChain's testing standards while ensuring proper async behavior in
the embeddings implementation.

Key improvements:
- Added standard EmbeddingsTests implementation
- Split custom configuration tests into a separate test class
- Added proper async test coverage with pytest-asyncio
- Fixed namespace handling in vector store integration tests
- Improved test organization and documentation

Dependencies: None (uses existing test dependencies)

Tests and Documentation:
-  Added standard test implementation following LangChain's patterns
-  Added comprehensive unit tests for configuration and async behavior
-  All tests passing locally
- No documentation changes needed (internal test improvements only)

Twitter handle: N/A

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2025-01-07 11:46:30 -08:00
Philippe PRADOS
2921597c71
community[patch]: Refactoring PDF loaders: 01 prepare (#29062)
- **Refactoring PDF loaders step 1**: "community: Refactoring PDF
loaders to standardize approaches"

- **Description:** Declare CloudBlobLoader in __init__.py. file_path is
Union[str, PurePath] anywhere
- **Twitter handle:** pprados

This is one part of a larger Pull Request (PR) that is too large to be
submitted all at once.
This specific part focuses to prepare the update of all parsers.

For more details, see [PR
28970](https://github.com/langchain-ai/langchain/pull/28970).

@eyurtsev it's the start of a PR series.
2025-01-07 11:00:04 -05:00
ccurme
55677e31f7
text-splitters[patch]: release 0.3.5 (#29054)
Resolves https://github.com/langchain-ai/langchain/issues/29053
2025-01-07 09:48:26 -05:00
Erick Friis
187131c55c
Revert "integrations[patch]: remove non-required chat param defaults" (#29048)
Reverts langchain-ai/langchain#26730

discuss best way to release default changes (esp openai temperature)
2025-01-06 14:45:34 -08:00
Bagatur
3d7ae8b5d2
integrations[patch]: remove non-required chat param defaults (#26730)
anthropic:
  - max_retries

openai:
  - n
  - temperature
  - max_retries

fireworks
  - temperature

groq
  - n
  - max_retries
  - temperature

mistral
  - max_retries
  - timeout
  - max_concurrent_requests
  - temperature
  - top_p
  - safe_mode

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2025-01-06 22:26:22 +00:00
UV
b9db8e9921
DOC: Improve human input prompt in FewShotChatMessagePromptTemplate example (#29023)
Fixes #29010 

This PR updates the example for FewShotChatMessagePromptTemplate by
modifying the human input prompt to include a more descriptive and
user-friendly question format ('What is {input}?') instead of just
'{input}'. This change enhances clarity and usability in the
documentation example.

Co-authored-by: Erick Friis <erick@langchain.dev>
2025-01-06 12:29:15 -08:00
ccurme
1f78d4faf4
voyageai[patch]: release 0.1.4 (#29046) 2025-01-06 20:20:19 +00:00
Eugene Evstafiev
6a152ce245
docs: add langchain-pull-md Markdown loader (#29024)
- [x] **PR title**: "docs: add langchain-pull-md Markdown loader"

- [x] **PR message**: 
- **Description:** This PR introduces the `langchain-pull-md` package to
the LangChain community. It includes a new document loader that utilizes
the pull.md service to convert URLs into Markdown format, particularly
useful for handling web pages rendered with JavaScript frameworks like
React, Angular, or Vue.js. This loader helps in efficient and reliable
Markdown conversion directly from URLs without local rendering, reducing
server load.
    - **Issue:** NA
    - **Dependencies:** requests >=2.25.1
    - **Twitter handle:** https://x.com/eugeneevstafev?s=21

- [x] **Add tests and docs**: 
1. Added unit tests to verify URL checking and conversion
functionalities.
2. Created a comprehensive example notebook detailing the usage of the
new loader.

- [x] **Lint and test**: 
- Completed local testing using `make format`, `make lint`, and `make
test` commands as per the LangChain contribution guidelines.


**Related Links:**
- [Package Repository](https://github.com/chigwell/langchain-pull-md)
- [PyPI Package](https://pypi.org/project/langchain-pull-md/)

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2025-01-06 19:32:43 +00:00
Ashvin
20a715a103
community: Fix redundancy in code. (#29022)
In my previous PR (#28953), I added an unwanted condition for validating
the Azure ML Endpoint. In this PR, I have rectified the issue.
2025-01-06 12:58:16 -05:00
Adrián Panella
acddfc772e
core: allow artifact in create_retriever_tool (#28903)
Add option to return content and artifacts, to also be able to access
the full info of the retrieved documents.

They are returned as a list of dicts in the `artifacts` property if
parameter `response_format` is set to `"content_and_artifact"`.

Defaults to `"content"` to keep current behavior.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2025-01-03 22:10:31 +00:00
ccurme
3e618b16cd
community[patch]: release 0.3.14 (#29019) 2025-01-03 15:34:24 -05:00
ccurme
18eb9c249d
langchain[patch]: release 0.3.14 (#29018) 2025-01-03 15:15:44 -05:00
ccurme
8e50e4288c
core[patch]: release 0.3.29 (#29017) 2025-01-03 14:58:39 -05:00
ccurme
85403bfa99
core[patch]: substantially speed up @deprecated (#29016)
Resolves https://github.com/langchain-ai/langchain/issues/26918

Unit tests don't raise any additional `LangChainDeprecationWarning`.
Would like guidance on how to test this more thoroughly if needed.

Note: speed up for `bind_tools` path is shown below. This is
**redundant** with the speedup in
https://github.com/langchain-ai/langchain/pull/29015. I include it for
demonstration purposes.

Before:

![Screenshot 2025-01-03 at 12 54
50 PM](https://github.com/user-attachments/assets/87f289eb-4cad-4304-85f7-5c58c59080f1)

After:

![Screenshot 2025-01-03 at 12 55
35 PM](https://github.com/user-attachments/assets/95ad0506-e1d1-4c5c-bb27-6a634d8810c9)
2025-01-03 14:38:53 -05:00
ccurme
4bb391fd4e
core[patch]: remove deprecated functions from tool binding hotpath (#29015)
(Inspired by https://github.com/langchain-ai/langchain/issues/26918)

We rely on some deprecated public functions in the hot path for tool
binding (`convert_pydantic_to_openai_function`,
`convert_python_function_to_openai_function`, and
`format_tool_to_openai_function`). My understanding is that what is
deprecated is not the functionality they implement, but use of them in
the public API -- we expect to continue to rely on them.

Here we update these functions to be private and not deprecated. We keep
the public, deprecated functions as simple wrappers that can be safely
deleted.

The `@deprecated` wrapper adds considerable latency due to its use of
the `inspect` module. This update speeds up `bind_tools` by a factor of
~100x:

Before:

![Screenshot 2025-01-03 at 11 22
55 AM](https://github.com/user-attachments/assets/94b1c433-ce12-406f-b64c-ca7103badfe0)

After:

![Screenshot 2025-01-03 at 11 23
41 AM](https://github.com/user-attachments/assets/02d0deab-82e4-45ca-8cc7-a20b91a5b5db)

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2025-01-03 19:29:01 +00:00
Eugene Evstafiev
a86904e735
docs: fix typo (#29012)
Thank you for contributing to LangChain!

- [x] **PR title**: "docs: fix typo"

- [x] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a minor fix of typo
    - **Issue:** NA
    - **Dependencies:** NA
    - **Twitter handle:** NA


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. ~~a test for the integration, preferably unit tests that do not rely
on network access,~~
2. ~~an example notebook showing its use. It lives in
`docs/docs/integrations` directory.~~


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
2025-01-03 09:52:24 -08:00
Erick Friis
919d1c7da6
box: remove box readme for api docs build (#29014) 2025-01-03 09:50:04 -08:00
Erick Friis
d8bc556c94
packages: update box location (#29013) 2025-01-03 09:45:13 -08:00
Amaan
8d7daa59fb
docs: add langchain dappier retriever integration notebooks (#28931)
Add a retriever to interact with Dappier APIs with an example notebook.

The retriever can be invoked with:

```python
from langchain_dappier import DappierRetriever

retriever = DappierRetriever(
    data_model_id="dm_01jagy9nqaeer9hxx8z1sk1jx6",
    k=5
)

retriever.invoke("latest tech news")
```

To retrieve 5 documents related to latest news in the tech sector. The
included notebook also includes deeper details about controlling filters
such as selecting a data model, number of documents to return, site
domain reference, minimum articles from the reference domain, and search
algorithm, as well as including the retriever in a chain.

The integration package can be found over here -
https://github.com/DappierAI/langchain-dappier
2025-01-03 10:21:41 -05:00
ccurme
0185010b88
community[patch]: additional check for prompt caching support (#29008)
Prompt caching explicitly excludes `gpt-4o-2024-05-13`:
https://platform.openai.com/docs/guides/prompt-caching

Resolves https://github.com/langchain-ai/langchain/issues/28997
2025-01-03 10:14:07 -05:00
Tari Yekorogha
ba9dfd9252
docs: Add FalkorDB Chat Message History and Update Package Registry (#28914)
This commit updates the documentation and package registry for the
FalkorDB Chat Message History integration.

**Changes:**

- Added a comprehensive example notebook
falkordb_chat_message_history.ipynb demonstrating how to use FalkorDB
for session-based chat message storage.

- Added a provider notebook for FalkorDB

- Updated libs/packages.yml to register FalkorDB as an integration
package, following LangChain's new guidelines for community
integrations.

**Notes:**

- This update aligns with LangChain's process for registering new
integrations via documentation updates and package registry
modifications.

- No functional or core package changes were made in this commit.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-02 15:46:47 -05:00
Ashvin
d26c102a5a
community: Update azureml endpoint (#28953)
- In this PR, I have updated the AzureML Endpoint with the latest
endpoint.
- **Description:** I have changed the existing `/chat/completions` to
`/models/chat/completions` in
libs/community/langchain_community/llms/azureml_endpoint.py
    - **Issue:** #25702

---------

Co-authored-by: = <=>
2025-01-02 14:47:02 -05:00
ccurme
7c28321f04
core[patch]: fix deprecation admonition in API ref (#28992)
Before:

![Screenshot 2025-01-02 at 1 49
30 PM](https://github.com/user-attachments/assets/cb30526a-fc0b-439f-96d1-962c226d9dc7)

After:

![Screenshot 2025-01-02 at 1 49
38 PM](https://github.com/user-attachments/assets/32c747ea-6391-4dec-b778-df457695d197)
2025-01-02 14:37:55 -05:00
Mohammad Mohtashim
0e74757b0a
(Community): DuckDuckGoSearchAPIWrapper backend changed from api to auto (#28961)
- **Description:** `DuckDuckGoSearchAPIWrapper` default value for
backend has been changed to avoid User Warning
- **Issue:** #28957
2025-01-02 14:08:22 -05:00
Mohammad Mohtashim
aa551cbcee
(Core) Small Change in Docstring for method partial for BasePromptTemplate (#28969)
- **Description:** Very small change in Docstring for
`BasePromptTemplate`
- **Issue:** #28966
2025-01-02 12:16:30 -05:00
minpeter
a873e0fbfb
community: update documentation and model IDs for FriendliAI provider (#28984)
### Description  

- In the example, remove `llama-2-13b-chat`,
`mixtral-8x7b-instruct-v0-1`.
- Fix llm friendli streaming implementation.
- Update examples in documentation and remove duplicates.

### Issue  
N/A  

### Dependencies  
None  

### Twitter handle  
`@friendliai`
2025-01-02 12:15:59 -05:00
Hrishikesh Kalola
437ec53e29
langchain.agents: corrected documentation (#28986)
**Description:**
This PR updates the codebase to reflect the deprecation of the AgentType
feature. It includes the following changes:

Documentation Update:

Added a deprecation notice to the AgentType class comment.
Provided a reference to the official LangChain migration guide for
transitioning to LangGraph agents.
Reference Link: https://python.langchain.com/docs/how_to/migrate_agent/

**Twitter handle:** @hrrrriiiishhhhh

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-02 12:13:42 -05:00
Mohammad Mohtashim
49a26c1fca
(Community): Fix Keyword argument for AzureAIDocumentIntelligenceParser (#28959)
- **Description:** Fix the `body` keyword argument for
AzureAIDocumentIntelligenceParser`
- **Issue:** #28948
2025-01-02 11:27:12 -05:00
ccurme
efc687a13b
community[patch]: fix instantiation for Slack tools (#28990)
Believe the current implementation raises PydanticUserError following
[this](https://github.com/pydantic/pydantic/releases/tag/v2.10.1)
Pydantic release.

Resolves https://github.com/langchain-ai/langchain/issues/28989
2025-01-02 16:14:17 +00:00
Yunlin Mao
c59093d67f
docs: add modelscope endpoint (#28941)
## Description

To integrate ModelScope inference API endpoints for both Embeddings,
LLMs and ChatModels, install the package
`langchain-modelscope-integration` (as discussed in issue #28928 ). This
is necessary because the package name `langchain-modelscope` was already
registered by another party.

ModelScope is a premier platform designed to connect model checkpoints
with model applications. It provides the necessary infrastructure to
share open models and promote model-centric development. For more
information, visit GitHub page:
[ModelScope](https://github.com/modelscope).
2025-01-02 10:08:41 -05:00
Bagatur
1c797ac68f
infra: speed up unit tests (#28974)
Co-authored-by: Erick Friis <erick@langchain.dev>
2025-01-02 04:13:08 +00:00
Morgante Pell
79fc9b6b04
cli: bump gritql version (#28981)
**Description:**

bump gritql dependency, to use new binary names from
[here](https://github.com/getgrit/gritql/pull/565)

**Issue:**

fixes https://github.com/langchain-ai/langchain/issues/27822
2025-01-01 20:02:46 -08:00
Bagatur
edbe7d5f5e
core,anthropic[patch]: fix with_structured_output typing (#28950) 2024-12-28 15:46:51 -05:00
dabzr
ffbe5b2106
partners: fix default value for stop_sequences in ChatGroq (#28924)
- **Description:**  
This PR addresses an issue with the `stop_sequences` field in the
`ChatGroq` class. Currently, the field is defined as:
```python
stop: Optional[Union[List[str], str]] = Field(None, alias="stop_sequences")
```  
This causes the language server (LSP) to raise an error indicating that
the `stop_sequences` parameter must be implemented. The issue occurs
because `Field(None, alias="stop_sequences")` is different compared to
`Field(default=None, alias="stop_sequences")`.


![image](https://github.com/user-attachments/assets/bfc34cb1-c664-4c31-b856-8f18419c7350)
To resolve the issue, the field is updated to:  
```python
stop: Optional[Union[List[str], str]] = Field(default=None, alias="stop_sequences")
```  
While this issue does not affect runtime behavior, it ensures
compatibility with LSPs and improves the development experience.
- **Issue:** N/A  
- **Dependencies:** None
2024-12-26 16:43:34 -05:00
Andy Wermke
5940ed3952
community: Fix error handling bug in ChatDeepInfra (#28918)
In the async ClientResponse, `response.text` is not a string property,
but an asynchronous function returning a string.
2024-12-26 14:45:12 -05:00
zep.hyr
7b4d2d5d44
Community : Add cost information for missing OpenAI model (#28882)
In the previous commit, the cached model key for this model was omitted.
When using the "gpt-4o-2024-11-20" model, the token count in the
callback appeared as 0, and the cost was recorded as 0.

We add model and cost information so that the token count and cost can
be displayed for the respective model.

- The message before modification is as follows.
```
Tokens Used: 0
Prompt Tokens: 0
Prompt Tokens Cached: 0 
Completion Tokens: 0  
Reasoning Tokens: 0
Successful Requests: 0
Total Cost (USD): $0.0
```

- The message after modification is as follows.
```
Tokens Used: 3783 
Prompt Tokens: 3625
Prompt Tokens Cached: 2560
Completion Tokens: 158
Reasoning Tokens: 0
Successful Requests: 1
Total Cost (USD): $0.010642500000000001
```
2024-12-26 14:28:31 -05:00
Erick Friis
3726a944c0
docs: sorted by downloads [wip] (#28869) 2024-12-23 13:13:35 -08:00
Andreas Motl
6352edf77f
docs: CrateDB: Register package langchain-cratedb, and add minimal "provider" documentation (#28877)
Hi Erick. Coming back from a previous attempt, we now made a separate
package for the CrateDB adapter, called `langchain-cratedb`, as advised.
Other than registering the package within `libs/packages.yml`, this
patch includes a minimal amount of documentation to accompany the advent
of this new package. Let us know about any mistakes we made, or changes
you would like to see. Thanks, Andreas.

## About
- **Description:** Register a new database adapter package,
`langchain-cratedb`, providing traditional vector store, document
loader, and chat message history features for a start.
- **Addressed to:** @efriis, @eyurtsev
- **References:** GH-27710
- **Preview:** [Providers » More »
CrateDB](https://langchain-git-fork-crate-workbench-register-la-4bf945-langchain.vercel.app/docs/integrations/providers/cratedb/)

## Status
- **PyPI:** https://pypi.org/project/langchain-cratedb/
- **GitHub:** https://github.com/crate/langchain-cratedb
- **Documentation (CrateDB):**
https://cratedb.com/docs/guide/integrate/langchain/
- **Documentation (LangChain):** _This PR._

## Backlog?
Is this applicable for this kind of patch?
> - [ ] **Add tests and docs**: If you're adding a new integration,
please include
> 1. a test for the integration, preferably unit tests that do not rely
on network access,
> 2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

## Q&A
1. Notebooks that use the LangChain CrateDB adapter are currently at
[CrateDB LangChain
Examples](https://github.com/crate/cratedb-examples/tree/main/topic/machine-learning/llm-langchain),
and the documentation refers to them. Because they are derived from very
old blueprints coming from LangChain 0.0.x times, we guess they need a
refresh before adding them to `docs/docs/integrations`. Is it applicable
to merge this minimal package registration + documentation patch, which
already includes valid code snippets in `cratedb.mdx`, and add
corresponding notebooks on behalf of a subsequent patch later?

2. How would it work getting into the tabular list of _Integration
Packages_ enumerated on the [documentation entrypoint page about
Providers](https://python.langchain.com/docs/integrations/providers/)?

/cc Please also review, @ckurze, @wierdvanderhaar, @kneth,
@simonprickett, if you can find the time. Thanks!
2024-12-23 10:55:44 -05:00
Wang Ran (汪然)
e5c9da3eb6
core[patch]: remove redundant imports (#28861)
`Graph` has been imported at Line: 62
2024-12-23 10:31:23 -05:00
Adrián Panella
8d9907088b
community(azuresearch): allow to use any valid credential (#28873)
Add option to use any valid credential type.
Differentiates async cases needed by Azure Search.

This could replace the use of a static token
2024-12-23 10:05:48 -05:00
Mohammad Mohtashim
41b6a86bbe
Community: LlamaCppEmbeddings embed_documents and embed_query (#28827)
- **Description:** `embed_documents` and `embed_query` was throwing off
the error as stated in the issue. The issue was that `Llama` client is
returning the embeddings in a nested list which is not being accounted
for in the current implementation and therefore the stated error is
being raised.
- **Issue:** #28813

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-23 09:50:22 -05:00
Darien Schettler
32917a0b98
Update dataframe.py (#28871)
community: optimize DataFrame document loader

**Description:**
Simplify the `lazy_load` method in the DataFrame document loader by
combining text extraction and metadata cleanup into a single operation.
This makes the code more concise while maintaining the same
functionality.

**Issue:** N/A

**Dependencies:** None

**Twitter handle:** N/A
2024-12-22 19:16:16 -05:00
yeounhak
f38fc89f35
community: Corrected aload func to be asynchronous from webBaseLoader (#28337)
- **Description:** The aload function, contrary to its name, is not an
asynchronous function, so it cannot work concurrently with other
asynchronous functions.

- **Issue:** #28336 

- **Test: **: Done

- **Docs: **
[here](e0a95e5646/docs/docs/integrations/document_loaders/web_base.ipynb (L201))

- **Lint: ** All checks passed

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-20 14:42:52 -05:00
Mohammad Mohtashim
8cf5f20bb5
required tool_choice added for ChatHuggingFace (#28851)
- **Description:** HuggingFace Inference Client V3 now supports
`required` as tool_choice which has been added.
- **Issue:** #28842
2024-12-20 12:06:04 -05:00
Sylvain DEPARTE
fcba567a77
partners: allow to set Prefix in AIMessage (for MistralAI) (#28846)
**Description:**

Added ability to set `prefix` attribute to prevent error : 
```
httpx.HTTPStatusError: Error response 400 while fetching https://api.mistral.ai/v1/chat/completions: {"object":"error","message":"Expected last role User or Tool (or Assistant with prefix True) for serving but got assistant","type":"invalid_request_error","param":null,"code":null}
```

Co-authored-by: Sylvain DEPARTE <sylvain.departe@wizbii.com>
2024-12-20 11:09:45 -05:00
Jacob Mansdorfer
6d81137325
community: adding langchain-predictionguard partner package documentation (#28832)
- *[x] **PR title**: "community: adding langchain-predictionguard
partner package documentation"

- *[x] **PR message**:
- **Description:** This PR adds documentation for the
langchain-predictionguard package to main langchain repo, along with
deprecating current Prediction Guard LLMs package. The LLMs package was
previously broken, so I also updated it one final time to allow it to
continue working from this point onward. . This enables users to chat
with LLMs through the Prediction Guard ecosystem.
    - **Package Links**: 
        -  [PyPI](https://pypi.org/project/langchain-predictionguard/)
- [Github
Repo](https://www.github.com/predictionguard/langchain-predictionguard)
    - **Issue:** None
    - **Dependencies:** None
- **Twitter handle:** [@predictionguard](https://x.com/predictionguard)

- *[x] **Add tests and docs**: All docs have been added for the partner
package, and the current LLMs package test was updated to reflect
changes.


- *[x] **Lint and test**: Linting tests are all passing.

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-12-20 10:51:44 -05:00
ccurme
f0e858b4e3
core[patch]: release 0.3.28 (#28837) 2024-12-19 17:52:32 -05:00
ccurme
137d1e9564
langchain[patch]: fix test following update to langchain-openai (#28838) 2024-12-19 22:39:48 +00:00
Emmanuel Leroy
c8db5a19ce
langchain_community.chat_models.oci_generative_ai: Fix a bug when using optional parameters in tools (#28829)
When using tools with optional parameters, the parameter `type` is not
longer available since langchain update to 0.3 (because of the pydantic
upgrade?) and there is now an `anyOf` field instead. This results in the
`type` being `None` in the chat request for the tool parameter, and the
LLM call fails with the error:

```
oci.exceptions.ServiceError: {'target_service': 'generative_ai_inference', 
'status': 400, 'code': '400', 
'opc-request-id': '...', 
'message': 'Parameter definition must have a type.', 
'operation_name': 'chat'
...
}
```

Example code that fails:

```
from langchain_community.chat_models.oci_generative_ai import ChatOCIGenAI
from langchain_core.tools import tool
from typing import Optional

llm = ChatOCIGenAI(
        model_id="cohere.command-r-plus",
        service_endpoint="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com",
        compartment_id="ocid1.compartment.oc1...",
        auth_profile="your_profile",
        auth_type="API_KEY",
        model_kwargs={"temperature": 0, "max_tokens": 3000},
)

@tool
def test(example: Optional[str] = None):
    """This is the tool to use to test things

    Args:
        example: example variable, defaults to None
    """
    return "this is a test"

llm_with_tools = llm.bind_tools([test])

result = llm_with_tools.invoke("can you make a test for g")
```

This PR sets the param type to `any` in that case, and fixes the
problem.

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-19 22:17:34 +00:00
Bagatur
c3ccd93c12
patch openai json mode test (#28831) 2024-12-19 21:43:32 +00:00
Bagatur
ce6748dbfe
xfail openai image token count test (#28828) 2024-12-19 21:23:30 +00:00
Anusha Karkhanis
26bdf40072
Langchain_Community: SQL LanguageParser (#28430)
## Description
(This PR has contributions from @khushiDesai, @ashvini8, and
@ssumaiyaahmed).

This PR addresses **Issue #11229** which addresses the need for SQL
support in document parsing. This is integrated into the generic
TreeSitter parsing library, allowing LangChain users to easily load
codebases in SQL into smaller, manageable "documents."

This pull request adds a new ```SQLSegmenter``` class, which provides
the SQL integration.

## Issue
**Issue #11229**: Add support for a variety of languages to
LanguageParser

## Testing
We created a file ```test_sql.py``` with several tests to ensure the
```SQLSegmenter``` is functional. Below are the tests we added:

- ```def test_is_valid```: Checks SQL validity.
- ```def test_extract_functions_classes```: Extracts individual SQL
statements.
- ```def test_simplify_code```: Simplifies SQL code with comments.

---------

Co-authored-by: Syeda Sumaiya Ahmed <114104419+ssumaiyaahmed@users.noreply.github.com>
Co-authored-by: ashvini hunagund <97271381+ashvini8@users.noreply.github.com>
Co-authored-by: Khushi Desai <khushi.desai@advantawitty.com>
Co-authored-by: Khushi Desai <59741309+khushiDesai@users.noreply.github.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
2024-12-19 20:30:57 +00:00
Bagatur
a7f2148061
openai[patch]: Release 0.2.14 (#28826) 2024-12-19 11:56:44 -08:00
Bagatur
1378ddfa5f
openai[patch]: type reasoning_effort (#28825) 2024-12-19 19:36:49 +00:00
Erick Friis
6a37899b39
core: dont mutate tool_kwargs during tool run (#28824)
fixes https://github.com/langchain-ai/langchain/issues/24621
2024-12-19 18:11:56 +00:00
Qun
033ac41760
fix crash when using create_xml_agent with parameterless function as … (#26002)
When using `create_xml_agent` or `create_json_chat_agent` to create a
agent, and the function corresponding to the tool is a parameterless
function, the `XMLAgentOutputParser` or `JSONAgentOutputParser` will
parse the tool input into an empty string, `BaseTool` will parse it into
a positional argument.
So, the program will crash finally because we invoke a parameterless
function but with a positional argument.Specially, below code will raise
StopIteration in
[_parse_input](https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/tools/base.py#L419)
```python
from langchain import hub
from langchain.agents import AgentExecutor, create_json_chat_agent, create_xml_agent
from langchain_openai import ChatOpenAI

prompt = hub.pull("hwchase17/react-chat-json")

llm = ChatOpenAI()

# agent = create_xml_agent(llm, tools, prompt)
agent = create_json_chat_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

agent_executor.invoke(......)
```

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-19 13:00:46 -05:00
Luke
f69695069d
text_splitters: Add HTMLSemanticPreservingSplitter (#25911)
**Description:** 

With current HTML splitters, they rely on secondary use of the
`RecursiveCharacterSplitter` to further chunk the document into
manageable chunks. The issue with this is it fails to maintain important
structures such as tables, lists, etc within HTML.

This Implementation of a HTML splitter, allows the user to define a
maximum chunk size, HTML elements to preserve in full, options to
preserve `<a>` href links in the output and custom handlers.

The core splitting begins with headers, similar to `HTMLHeaderSplitter`.
If these sections exceed the length of the `max_chunk_size` further
recursive splitting is triggered. During this splitting, elements listed
to preserve, will be excluded from the splitting process. This can cause
chunks to be slightly larger then the max size, depending on preserved
length. However, all contextual relevance of the preserved item remains
intact.

**Custom Handlers**: Sometimes, companies such as Atlassian have custom
HTML elements, that are not parsed by default with `BeautifulSoup`.
Custom handlers allows a user to provide a function to be ran whenever a
specific html tag is encountered. This allows the user to preserve and
gather information within custom html tags that `bs4` will potentially
miss during extraction.

**Dependencies:** User will need to install `bs4` in their project to
utilise this class

I have also added in `how_to` and unit tests, which require `bs4` to
run, otherwise they will be skipped.

Flowchart of process:


![HTMLSemanticPreservingSplitter](https://github.com/user-attachments/assets/20873c36-22ed-4c80-884b-d3c6f433f5a7)

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-19 12:09:22 -05:00
Tommaso De Lorenzo
24bfa062bf
langchain: add support for Google Anthropic Vertex AI model garden provider in init_chat_model (#28177)
Simple modification to add support for anthropic models deployed in
Google Vertex AI model garden in `init_chat_model` importing
`ChatAnthropicVertex`

- [v] **Lint and test**
2024-12-19 12:06:21 -05:00
Erick Friis
ff7b01af88
anthropic: less pydantic for client (#28823) 2024-12-19 08:00:02 -08:00
Erick Friis
f1d783748a
anthropic: sdk bump (#28820) 2024-12-19 15:39:21 +00:00
Erick Friis
907f36a6e9
fireworks: fix lint (#28821) 2024-12-19 15:36:36 +00:00
Erick Friis
6526db4871
community: bump core (#28819) 2024-12-19 06:41:53 -08:00
Vignesh A
4c9acdfbf1
Community : Add OpenAI prompt caching and reasoning tokens tracking (#27135)
Added Token tracking for OpenAI's prompt caching and reasoning tokens
Costs updated from https://openai.com/api/pricing/

usage example
```python
from langchain_community.callbacks import get_openai_callback
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model_name="o1-mini",temperature=1)

with get_openai_callback() as cb:
    response = llm.invoke("hi "*1500)
    print(cb)
```
Output
```
Tokens Used: 1720
	Prompt Tokens: 1508
		Prompt Tokens Cached: 1408
	Completion Tokens: 212
		Reasoning Tokens: 192
Successful Requests: 1
Total Cost (USD): $0.0049559999999999995
```

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-19 09:31:13 -05:00
ScriptShi
97f1e1d39f
community: tablestore vector store check the dimension of the embedding when writing it to store. (#28812)
Added some restrictions to a vectorstore I released in the community
before.
2024-12-19 09:30:43 -05:00
Wang Ran (汪然)
f48755d35b
core: typo Utilities for tests. -> Utilities for pydantic. (#28814)
**Description:** typo
2024-12-19 09:26:17 -05:00
Wang Ran (汪然)
51b8ddaf10
core: typo in runnable (#28815)
Thank you for contributing to LangChain!

**Description:** Typo
2024-12-19 09:25:57 -05:00
Erick Friis
3b036a1cf2
partners/fireworks: release 0.2.6 (#28805) 2024-12-18 22:48:35 +00:00
Erick Friis
4eb8bf7793
partners/anthropic: release 0.3.1 (#28801) 2024-12-18 22:45:38 +00:00
Lu Peng
50afa7c4e7
community: add new parameter default_headers (#28700)
Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- "community: 1. add new parameter `default_headers` for oci model
deployments and oci chat model deployments. 2. updated k parameter in
OCIModelDeploymentLLM class."


- [x] **PR message**:
- **Description:** 1. add new parameters `default_headers` for oci model
deployments and oci chat model deployments. 2. updated k parameter in
OCIModelDeploymentLLM class.


- [x] **Add tests and docs**:
  1. unit tests
  2. notebook

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-18 22:33:23 +00:00
Erick Friis
cc616de509
partners/xai: release 0.1.1 (#28806) 2024-12-18 22:15:24 +00:00
Erick Friis
ba8c1b0d8c
partners/groq: release 0.2.2 (#28804) 2024-12-18 22:12:02 +00:00
Erick Friis
a119cae5bd
partners/mistralai: release 0.2.4 (#28803) 2024-12-18 22:11:48 +00:00
Erick Friis
514d78516b
partners/ollama: release 0.2.2 (#28802) 2024-12-18 22:11:08 +00:00
Bagatur
68940dd0d6
openai[patch]: Release 0.2.13 (#28800) 2024-12-18 22:08:47 +00:00
Erick Friis
4dc28b43ac
community: release 0.3.13 (#28798) 2024-12-18 21:58:46 +00:00
Bagatur
557f63c2e6
core[patch]: Release 0.3.27 (#28799) 2024-12-18 21:58:03 +00:00
Bagatur
4a531437bb
core[patch], openai[patch]: Handle OpenAI developer msg (#28794)
- Convert developer openai messages to SystemMessage
- store additional_kwargs={"__openai_role__": "developer"} so that the
correct role can be reconstructed if needed
- update ChatOpenAI to read in openai_role

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-18 21:54:07 +00:00
Erick Friis
079f1d93ab
langchain: release 0.3.13 (#28797) 2024-12-18 12:32:00 -08:00
Yuxin Chen
3256b5d6ae
text-splitters: fix state persistence issue in ExperimentalMarkdownSyntaxTextSplitter (#28373)
- **Description:** 
This PR resolves an issue with the
`ExperimentalMarkdownSyntaxTextSplitter` class, which retains the
internal state across multiple calls to the `split_text` method. This
behaviour caused an unintended accumulation of chunks in `self`
variables, leading to incorrect outputs when processing multiple
Markdown files sequentially.

- Modified `libs\text-splitters\langchain_text_splitters\markdown.py` to
reset the relevant internal attributes at the start of each `split_text`
invocation. This ensures each call processes the input independently.
- Added unit tests in
`libs\text-splitters\tests\unit_tests\test_text_splitters.py` to verify
the fix and ensure the state does not persist across calls.

- **Issue:**  
Fixes [#26440](https://github.com/langchain-ai/langchain/issues/26440).

- **Dependencies:**
No additional dependencies are introduced with this change.


- [x] Unit tests were added to verify the changes.
- [x] Updated documentation where necessary.  
- [x] Ran `make format`, `make lint`, and `make test` to ensure
compliance with project standards.

---------

Co-authored-by: Angel Chen <angelchen396@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-18 20:27:59 +00:00
Mohammad Mohtashim
7c8f977695
Community: Fix with_structured_output for ChatSambaNovaCloud (#28796)
- **Description:** The `kwargs` was being checked as None object which
was causing the rest of code in `with_structured_output` not getting
executed. The checking part has been fixed in this PR.
- **Issue:** #28776
2024-12-18 14:35:06 -05:00
V.Prasanna kumar
684b146b18
Fixed adding float values into DynamoDB (#26562)
Thank you for contributing to LangChain!

- [x] **PR title**: Add float Message into Dynamo DB
  -  community
  - Example: "community: Chat Message History 


- [x] **PR message**: 
- **Description:** pushing float values into dynamo db creates error ,
solved that by converting to str type
    - **Issue:** Float values are not getting pushed
    - **Twitter handle:** VpkPrasanna
    
    
Have added an utility function for str conversion , let me know where to
place it happy to do an commit.
    
    This PR is from an discussion of #26543
    
    @hwchase17 @baskaryan @efriis

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-18 13:45:00 -05:00
William FH
50ea1c3ea3
[Core] respect tracing project name cvar (#28792) 2024-12-18 10:02:02 -08:00
Martin Triska
e6b41d081d
community: DocumentLoaderAsParser wrapper (#27749)
## Description

This pull request introduces the `DocumentLoaderAsParser` class, which
acts as an adapter to transform document loaders into parsers within the
LangChain framework. The class enables document loaders that accept a
`file_path` parameter to be utilized as blob parsers. This is
particularly useful for integrating various document loading
capabilities seamlessly into the LangChain ecosystem.

When merged in together with PR
https://github.com/langchain-ai/langchain/pull/27716 It opens options
for `SharePointLoader` / `OneDriveLoader` to process any filetype that
has a document loader.

### Features

- **Flexible Parsing**: The `DocumentLoaderAsParser` class can adapt any
document loader that meets the criteria of accepting a `file_path`
argument, allowing for lazy parsing of documents.
- **Compatibility**: The class has been designed to work with various
document loaders, making it versatile for different use cases.

### Usage Example

To use the `DocumentLoaderAsParser`, you would initialize it with a
suitable document loader class and any required parameters. Here’s an
example of how to do this with the `UnstructuredExcelLoader`:

```python
from langchain_community.document_loaders.blob_loaders import Blob
from langchain_community.document_loaders.parsers.documentloader_adapter import DocumentLoaderAsParser
from langchain_community.document_loaders.excel import UnstructuredExcelLoader

# Initialize the parser adapter with UnstructuredExcelLoader
xlsx_parser = DocumentLoaderAsParser(UnstructuredExcelLoader, mode="paged")

# Use parser, for ex. pass it to MimeTypeBasedParser
MimeTypeBasedParser(
    handlers={
        "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet": xlsx_parser
    }
)
```


- **Dependencies:** None
- **Twitter handle:** @martintriska1

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-18 12:47:08 -05:00
Erick Friis
9b024d00c9
text-splitters: release 0.3.4 (#28795) 2024-12-18 09:44:36 -08:00
Erick Friis
5cf965004c
core: release 0.3.26 (#28793) 2024-12-18 17:28:42 +00:00
Mohammad Mohtashim
d49df4871d
[Community]: Image Extraction Fixed for PDFPlumberParser (#28491)
- **Description:** One-Bit Images was raising error which has been fixed
in this PR for `PDFPlumberParser`
 - **Issue:** #28480

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-18 11:45:48 -05:00
binhnd102
f723a8456e
Fixes: community: fix LanceDB return no metadata (#27024)
- [ x ] Fix when lancedb return table without metadata column
- **Description:** Check the table schema, if not has metadata column,
init the Document with metadata argument equal to empty dict
    - **Issue:** https://github.com/langchain-ai/langchain/issues/27005

- [ x ] **Add tests and docs**

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-12-18 15:21:28 +00:00
ANSARI MD AAQIB AHMED
91d28ef453
Add langchain-yt-dlp Document Loader Documentation (#28775)
## Overview
This PR adds documentation for the `langchain-yt-dlp` package, a YouTube
document loader that uses `yt-dlp` for Youtube videos metadata
extraaction.

## Changes
- Added documentation notebook for YoutubeLoader
- Updated packages.yml to include langchain-yt-dlp

## Motivation
The existing LangChain YoutubeLoader was unable to fetch YouTube
metadata due to changes in YouTube's structure. This package resolves
those issues by leveraging the `yt-dlp` library.

## Features
- Reliable YouTube metadata extraction

## Related
- Package Repository: https://github.com/aqib0770/langchain-yt-dlp
- PyPI Package: https://pypi.org/project/langchain-yt-dlp/

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-18 10:16:50 -05:00
GITHUBear
33b1fb95b8
partners: langchain-oceanbase Integration (#28782)
Hi, langchain team! I'm a maintainer of
[OceanBase](https://github.com/oceanbase/oceanbase).

With the integration guidance, I create a python lib named
[langchain-oceanbase](https://github.com/oceanbase/langchain-oceanbase)
to integrate `Oceanbase Vector Store` with `Langchain`.

So I'd like to add the required docs. I will appreciate your feedback.
Thank you!

---------

Signed-off-by: shanhaikang.shk <shanhaikang.shk@oceanbase.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-18 14:51:49 +00:00
Rave Harpaz
986b752fc8
Add OCI Generative AI new model and structured output support (#28754)
- [X] **PR title**: 
 community: Add new model and structured output support


- [X] **PR message**: 
- **Description:** add support for meta llama 3.2 image handling, and
JSON mode for structured output
    - **Issue:** NA
    - **Dependencies:** NA
    - **Twitter handle:** NA


- [x] **Add tests and docs**: 
  1. we have updated our unit tests,
  2. no changes required for documentation.


- [x] **Lint and test**: 
make format, make lint and make test we run successfully

---------

Co-authored-by: Arthur Cheng <arthur.cheng@oracle.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
2024-12-18 09:50:25 -05:00
David Pryce-Compson
ef24220d3f
community: adding haiku 3.5 and opus callbacks (#28783)
**Description:** 
Adding new AWS Bedrock model and their respective costs to match
https://aws.amazon.com/bedrock/pricing/ for the Bedrock callback

**Issue:** 
Missing models for those that wish to try them out

**Dependencies:**
Nothing added

**Twitter handle:**
@David_Pryce and / or @JamfSoftware

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2024-12-18 09:45:10 -05:00
Yudai Kotani
05a44797ee
langchain_community: Add default None values to DocumentAttributeValue class properties (#28785)
**Description**: 
This PR addresses an issue where the DocumentAttributeValue class
properties did not have default values of None. By explicitly setting
the Optional attributes (DateValue, LongValue, StringListValue, and
StringValue) to default to None, this change ensures the class functions
as expected when no value is provided for these attributes.

**Changes Made**:
Added default None values to the following properties of the
DocumentAttributeValue class:
DateValue
LongValue
StringListValue
StringValue
Removed the invalid argument extra="allow" from the BaseModel
inheritance.
Dependencies: None.

**Twitter handle (optional)**: @__korikori1021

**Checklist**
- [x] Verified that KendraRetriever works as expected after the changes.

Co-authored-by: y1u0d2a1i <y.kotani@raksul.com>
2024-12-18 09:43:04 -05:00
Satyam Kumar
90f7713399
refactor: improve docstring parsing logic for Google style (#28730)
Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"


Description:  
Improved the `_parse_google_docstring` function in `langchain/core` to
support parsing multi-paragraph descriptions before the `Args:` section
while maintaining compliance with Google-style docstring guidelines.
This change ensures better handling of docstrings with detailed function
descriptions.

Issue:  
Fixes #28628

Dependencies:  
None.

Twitter handle:  
@isatyamks

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-18 09:35:19 -05:00
Dong Shin
0b1359801e
community: add trust_env at web_base_loader (#28514)
- **Description:** I am working to address a similar issue to the one
mentioned in https://github.com/langchain-ai/langchain/pull/19499.
Specifically, there is a problem with the Webbase loader used in
open-webui, where it fails to load the proxy configuration. This PR aims
to resolve that issue.




<!--If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.-->

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-17 21:18:16 -05:00
Erick Friis
be738aa7de
packages: enable vertex api build (#28773) 2024-12-17 11:31:14 -08:00
Bagatur
ac278cbe8b
core[patch]: export InjectedToolCallId (#28772) 2024-12-17 19:29:20 +00:00
Bagatur
e4d3ccf62f
json mode standard test (#25497)
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-17 18:47:34 +00:00
Frank Dai
e81433497b
community: support Confluence cookies (#28760)
**Description**: Some confluence instances don't support personal access
token, then cookie is a convenient way to authenticate. This PR adds
support for Confluence cookies.

**Twitter handle**: soulmachine
2024-12-17 12:16:36 -05:00
ccurme
b745281eec
anthropic[patch]: increase timeouts for integration tests (#28767)
Some tests consistently ran into the 10s limit in CI.
2024-12-17 15:47:17 +00:00
Vinit Kudva
a00258ec12
chroma: fix persistence if client_settings is passed in (#25199)
…ent path given.

Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-17 10:03:02 -05:00
Omri Eliyahu Levy
f8883a1321
partners/voyageai: enable setting output dimension (#28740)
Voyage has introduced voyage-3-large and voyage-code-3, which feature
different output dimensions by leveraging a technique called "Matryoshka
Embeddings" (see blog -
https://blog.voyageai.com/2024/12/04/voyage-code-3/).
These two models are available in various sizes: [256, 512, 1024, 2048]
(https://docs.voyageai.com/docs/embeddings#model-choices).

This PR adds the option to set the required output dimension.
2024-12-17 10:02:00 -05:00
German Martin
3a1d05394d
community: Apache AGE wrapper. Ensure Node Uniqueness by ID. (#28759)
**Description:**

The Apache AGE graph integration incorrectly handled node merging,
allowing duplicate nodes with different IDs but the same type and other
properties. Unlike
[Neo4j](cdf6202156/libs/community/langchain_community/graphs/neo4j_graph.py (L47)),
[Memgraph](cdf6202156/libs/community/langchain_community/graphs/memgraph_graph.py (L50)),
[Kuzu](cdf6202156/libs/community/langchain_community/graphs/kuzu_graph.py (L253)),
and
[Gremlin](cdf6202156/libs/community/langchain_community/graphs/gremlin_graph.py (L165)),
it did not use the node ID as the primary identifier for merging.

This inconsistency caused data integrity issues and unexpected behavior
when users expected updates to specific nodes by ID.

**Solution:**
This PR modifies the `node_insert_query` to `MERGE` nodes based on label
and ID *only* and updates properties with `SET`, aligning the behavior
with other graph database integrations. The `_format_properties` method
was also modified to handle id overrides.

**Impact:**

This fix ensures data integrity by preventing duplicate nodes, and
provides a consistent behavior across graph database integrations.
2024-12-17 09:21:59 -05:00
gsa9989
cdf6202156
cosmosdbnosql: Added Cosmos DB NoSQL Semantic Cache Integration with tests and jupyter notebook (#24424)
* Added Cosmos DB NoSQL Semantic Cache Integration with tests and
jupyter notebook

---------

Co-authored-by: Aayush Kataria <aayushkataria3011@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-16 21:57:05 -05:00
Brian Burgin
27a9056725
community: Fix ChatLiteLLMRouter runtime issues (#28163)
**Description:** Fix ChatLiteLLMRouter ctor validation and model_name
parameter
**Issue:** #19356, #27455, #28077
**Twitter handle:** @bburgin_0
2024-12-16 18:17:39 -05:00
Mikhail Khludnev
00deacc67e
docs, external: introduce langchain-localai (#28751)
Thank you for contributing to LangChain!

Referring to https://github.com/mkhludnev/langchain-localai

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-16 22:22:37 +00:00
Erick Friis
d4b5e7ef22
community: recommend RedisVectorStore over Redis (#28749) 2024-12-16 21:08:30 +00:00
Hiros
8f5e72de05
community: Correctly handle multi-element rich text (#25762)
**Description:**

- Add _concatenate_rich_text method to combine all elements in rich text
arrays
- Update load_page method to use _concatenate_rich_text for rich text
properties
- Ensure all text content is captured, including inline code and
formatted text
- Add unit tests to verify correct handling of multi-element rich text
This fix prevents truncation of content after backticks or other
formatting elements.

 **Issue:**

Using Notion DB Loader, the text for `richtext` and `title` is truncated
after 1st element was loaded as Notion Loader only read the first
element.

**Dependencies:** any dependencies required for this change
None.

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-16 20:20:27 +00:00
Antonio Lanza
b2102b8cc4
text-splitters: Inconsistent results with NLTKTextSplitter's add_start_index=True (#27782)
This PR closes #27781

# Problem
The current implementation of `NLTKTextSplitter` is using
`sent_tokenize`. However, this `sent_tokenize` doesn't handle chars
between 2 tokenized sentences... hence, this behavior throws errors when
we are using `add_start_index=True`, as described in issue #27781. In
particular:
```python
from nltk.tokenize import sent_tokenize

output1 = sent_tokenize("Innovation drives our success. Collaboration fosters creative solutions. Efficiency enhances data management.", language="english")
print(output1)
output2 = sent_tokenize("Innovation drives our success.        Collaboration fosters creative solutions. Efficiency enhances data management.", language="english")
print(output2)
>>> ['Innovation drives our success.', 'Collaboration fosters creative solutions.', 'Efficiency enhances data management.']
>>> ['Innovation drives our success.', 'Collaboration fosters creative solutions.', 'Efficiency enhances data management.']
```

# Solution
With this new `use_span_tokenize` parameter, we can use NLTK to create
sentences (with `span_tokenize`), but also add extra chars to be sure
that we still can map the chunks to the original text.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Erick Friis <erickfriis@gmail.com>
2024-12-16 19:53:15 +00:00
Tari Yekorogha
d262d41cc0
community: added FalkorDB vector store support i.e implementation, test, docs an… (#26245)
**Description:** Added support for FalkorDB Vector Store, including its
implementation, unit tests, documentation, and an example notebook. The
FalkorDB integration allows users to efficiently manage and query
embeddings in a vector database, with relevance scoring and maximal
marginal relevance search. The following components were implemented:

- Core implementation for FalkorDBVector store.
- Unit tests ensuring proper functionality and edge case coverage.
- Example notebook demonstrating an end-to-end setup, search, and
retrieval using FalkorDB.

**Twitter handle:** @tariyekorogha

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-16 19:37:55 +00:00
Aaron Pham
12fced13f4
chore(community): update to OpenLLM 0.6 (#24609)
Update to OpenLLM 0.6, which we decides to make use of OpenLLM's
OpenAI-compatible endpoint. Thus, OpenLLM will now just become a thin
wrapper around OpenAI wrapper.

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>

---------

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: ccurme <chester.curme@gmail.com>
2024-12-16 14:30:07 -05:00
Lvlvko
5c17a4ace9
community: support Hunyuan Embedding (#23160)
## description

- I refactor `Chathunyuan` using tencentcloud sdk because I found the
original one can't work in my application
- I add `HunyuanEmbeddings` using tencentcloud sdk
- Both of them are extend the basic class of langchain. I have fully
tested them in my application

## Dependencies
- tencentcloud-sdk-python

---------

Co-authored-by: centonhuang <centonhuang@tencent.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-16 19:27:19 +00:00
Harrison Chase
de7996c2ca
core: add kwargs support to VectorStore (#25934)
has been missing the passthrough until now

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-16 18:57:57 +00:00
Lorenzo
b79a1156ed
community: correct return type of get_files_from_directory in github tool (#27885)
### About:
- **Description:** the _get_files_from_directory_ method return a
string, but it's used in other methods that expect a List[str]
- **Issue:** None
- **Dependencies:** None

This pull request import a new method _list_files_ with the old logic of
_get_files_from_directory_, but it return a List[str] at the end.
The behavior of _ get_files_from_directory_ is not changed.
2024-12-16 10:30:33 -08:00
Sheepsta300
580a8d53f9
community: Add configurable VisualFeatures to the AzureAiServicesImageAnalysisTool (#27444)
Thank you for contributing to LangChain!

- [ ] **PR title**: community: Add configurable `VisualFeatures` to the
`AzureAiServicesImageAnalysisTool`


- [ ] **PR message**:  
- **Description:** The `AzureAiServicesImageAnalysisTool` is a good
service and utilises the Azure AI Vision package under the hood.
However, since the creation of this tool, new `VisualFeatures` have been
added to allow the user to request other image specific information to
be returned. Currently, the tool offers neither configuration of which
features should be return nor does it offer any newer feature types. The
aim of this PR is to address this and expose more of the Azure Service
in this integration.
- **Dependencies:** no new dependencies in the main class file,
azure.ai.vision.imageanalysis added to extra test dependencies file.


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. Although no tests exist for already implemented Azure Service tools,
I've created 3 unit tests for this class that test initialisation and
credentials, local file analysis and a test for the new changes/
features option.


- [ ] **Lint and test**: All linting has passed.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-16 18:30:04 +00:00
Erick Friis
1c120e9615
core: xml output parser tags docstring (#28745) 2024-12-16 18:25:16 +00:00
Ana
ebab2ea81b
Fix Azure National Cloud authentication using token (RBAC) (Generated by Ana - AI SDE) (#25843)
This pull request addresses the issue with authenticating Azure National
Cloud using token (RBAC) in the AzureSearch vectorstore implementation.

## Changes

- Modified the `_get_search_client` method in `azuresearch.py` to pass
`additional_search_client_options` to the `SearchIndexClient` instance.

## Implementation Details

The patch updates the `SearchIndexClient` initialization to include the
`additional_search_client_options` parameter:

```python
index_client: SearchIndexClient = SearchIndexClient(
    endpoint=endpoint,
    credential=credential,
    user_agent=user_agent,
    **additional_search_client_options
)
```

This change allows the `audience` parameter to be correctly passed when
using Azure National Cloud, fixing the authentication issues with
GovCloud & RBAC.

This patch was generated by [Ana - AI SDE](https://openana.ai/), an
AI-powered software development assistant.

This is a fix for [Issue
25823](https://github.com/langchain-ai/langchain/issues/25823)

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-12-16 18:22:24 +00:00
chenzimin
169d419581
community: Remove all other keys in ChatLiteLLM and add api_key (#28097)
Thank you for contributing to LangChain!

- **PR title**: "community: Remove all other keys in ChatLiteLLM and add
api_key"


- **PR message**: Currently, no api_key are passed to LiteLLM, and
LiteLLM only takes on api_key parameter. Therefore I removed all current
`*_api_key` attributes (They are not used), and added `api_key` that is
passed to ChatLiteLLM.
  - Should fix issue #27826

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-16 17:54:29 +00:00
German Martin
d5d18c62b3
community: Apache AGE wrapper additional edge cases. (#28151)
Description: 
Current AGEGraph() implementation does some custom wrapping for graph
queries. The method here is _wrap_query() as it parse the field from the
original query to add some SQL context to it.
This improves the current parsing logic to cover additional edge cases
that are added to the test coverage, basically if any Node property name
or value has the "return" literal in it will break the graph / SQL
query.
We discovered this while dealing with real world datasets, is not an
uncommon scenario and I think it needs to be covered.
2024-12-16 11:28:01 -05:00
Rock2z
768e4a7fd4
[community][fix] Compatibility support to bump up wikibase-rest-api-client version (#27316)
**Description:**

This PR addresses the `TypeError: sequence item 0: expected str
instance, FluentValue found` error when invoking `WikidataQueryRun`. The
root cause was an incompatible version of the
`wikibase-rest-api-client`, which caused the tool to fail when handling
`FluentValue` objects instead of strings.

The current implementation only supports `wikibase-rest-api-client<0.2`,
but the latest version is `0.2.1`, where the current implementation
breaks. Additionally, the error message advises users to install the
latest version: [code
reference](https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/utilities/wikidata.py#L125C25-L125C32).
Therefore, this PR updates the tool to support the latest version of
`wikibase-rest-api-client`.

Key changes:
- Updated the handling of `FluentValue` objects to ensure compatibility
with the latest `wikibase-rest-api-client`.
- Removed the restriction to `wikibase-rest-api-client<0.2` and updated
to support the latest version (`0.2.1`).

**Issue:**

Fixes [#24093](https://github.com/langchain-ai/langchain/issues/24093) –
`TypeError: sequence item 0: expected str instance, FluentValue found`.

**Dependencies:**

- Upgraded `wikibase-rest-api-client` to the latest version to resolve
the issue.

---------

Co-authored-by: peiwen_zhang <peiwen_zhang@email.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-16 16:22:18 +00:00
André Quintino
a26c786bc5
community: refactor opensearch query constructor to use wildcard instead of match in the contain comparator (#26653)
- **Description:** Changed the comparator to use a wildcard query
instead of match. This modification allows for partial text matching on
analyzed fields, which improves the flexibility of the search by
performing full-text searches that aren't limited to exact matches.
- **Issue:** The previous implementation used a match query, which
performs exact matches on analyzed fields. This approach limited the
search capabilities by requiring the query terms to align with the
indexed text. The modification to use a wildcard query instead addresses
this limitation. The wildcard query allows for partial text matching,
which means the search can return results even if only a portion of the
term matches the text. This makes the search more flexible and suitable
for use cases where exact matches aren't necessary or expected, enabling
broader full-text searches across analyzed fields.
In short, the problem was that match queries were too restrictive, and
the change to wildcard queries enhances the ability to perform partial
matches.
- **Dependencies:** none
- **Twitter handle:** @Andre_Q_Pereira

---------

Co-authored-by: André Quintino <andre.quintino@tui.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-16 11:16:34 -05:00
Davi Schumacher
0f9b4bf244
community[patch]: update dynamodb chat history to update instead of overwrite (#22397)
**Description:**
The current implementation of `DynamoDBChatMessageHistory` updates the
`History` attribute for a given chat history record by first extracting
the existing contents into memory, appending the new message, and then
using the `put_item` method to put the record back. This has the effect
of overwriting any additional attributes someone may want to include in
the record, like chat session metadata.

This PR suggests changing from using `put_item` to using `update_item`
instead which will keep any other attributes in the record untouched.
The change is backward compatible since
1. `update_item` is an "upsert" operation, creating the record if it
doesn't already exist, otherwise updating it
2. It only touches the db insert call and passes the exact same
information. The rest of the class is left untouched

**Dependencies:**
None

**Tests and docs:**
No unit tests currently exist for the `DynamoDBChatMessageHistory`
class. This PR adds the file
`libs/community/tests/unit_tests/chat_message_histories/test_dynamodb_chat_message_history.py`
to test the `add_message` and `clear` methods. I wanted to use the moto
library to mock DynamoDB calls but I could not get poetry to resolve it
so I mocked those calls myself in the test. Therefore, no test
dependencies were added.

The change was tested on a test DynamoDB table as well. The first three
images below show the current behavior. First a message is added to chat
history, then a value is inserted in the record in some other attribute,
and finally another message is added to the record, destroying the other
attribute.

![using_put_1_first_message](https://github.com/langchain-ai/langchain/assets/29493541/426acd62-fe29-42f4-b75f-863fb8b3fb21)

![using_put_2_add_attribute](https://github.com/langchain-ai/langchain/assets/29493541/f8a1c864-7114-4fe3-b487-d6f9252f8f92)

![using_put_3_second_message](https://github.com/langchain-ai/langchain/assets/29493541/8b691e08-755e-4877-8969-0e9769e5d28a)

The next three images show the new behavior. Once again a value is added
to an attribute other than the History attribute, but now when the
followup message is added it does not destroy that other attribute. The
History attribute itself is unaffected by this change.

![using_update_1_first_message](https://github.com/langchain-ai/langchain/assets/29493541/3e0d76ed-637e-41cd-82c7-01a86c468634)

![using_update_2_add_attribute](https://github.com/langchain-ai/langchain/assets/29493541/52585f9b-71a2-43f0-9dfc-9935aa59c729)

![using_update_3_second_message](https://github.com/langchain-ai/langchain/assets/29493541/f94c8147-2d6f-407a-9a0f-86b94341abff)

The doc located at `docs/docs/integrations/memory/aws_dynamodb.ipynb`
required no changes and was tested as well.
2024-12-16 10:38:00 -05:00
Christophe Bornet
6ddd5dbb1e
community: Add FewShotSQLTool (#28232)
The `FewShotSQLTool` gets some SQL query examples from a
`BaseExampleSelector` for a given question.
This is useful to provide [few-shot
examples](https://python.langchain.com/docs/how_to/sql_prompting/#few-shot-examples)
capability to an SQL agent.

Example usage:
```python
from langchain.agents.agent_toolkits.sql.prompt import SQL_PREFIX

embeddings = OpenAIEmbeddings()

example_selector = SemanticSimilarityExampleSelector.from_examples(
    examples,
    embeddings,
    AstraDB,
    k=5,
    input_keys=["input"],
    collection_name="lc_few_shots",
    token=ASTRA_DB_APPLICATION_TOKEN,
    api_endpoint=ASTRA_DB_API_ENDPOINT,
)

few_shot_sql_tool = FewShotSQLTool(
    example_selector=example_selector,
    description="Input to this tool is the input question, output is a few SQL query examples related to the input question. Always use this tool before checking the query with sql_db_query_checker!"
)

agent = create_sql_agent(
    llm=llm, 
    db=db, 
    prefix=SQL_PREFIX + "\nYou MUST get some example queries before creating the query.", 
    extra_tools=[few_shot_sql_tool]
)

result = agent.invoke({"input": "How many artists are there?"})
```

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-16 15:37:21 +00:00
Mohammad Mohtashim
8d746086ab
Added bind_tools support for ChatMLX along with small fix in _stream (#28743)
- **Description:** Added Support for `bind_tool` as requested in the
issue. Plus two issue in `_stream` were fixed:
    - Corrected the Positional Argument Passing for `generate_step`
    - Accountability if `token` returned by `generate_step` is integer.
- **Issue:** #28692
2024-12-16 09:52:49 -05:00
Jorge Piedrahita Ortiz
558b65ea32
community: SamabaStudio Tool Calling and Structured Output (#28025)
Description: Add tool calling and structured output support for
SambaStudio chat models, docs included

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-16 06:15:19 +00:00
clairebehue
fb44e74ca4
community: fix AzureSearch Oauth with azure_ad_access_token (#26995)
**Description:** 
AzureSearch vector store: create a wrapper class on
`azure.core.credentials.TokenCredential` (which is not-instantiable) to
fix Oauth usage with `azure_ad_access_token` argument

**Issue:** [the issue it
fixes](https://github.com/langchain-ai/langchain/issues/26216)

 **Dependencies:** None

- [x] **Lint and test**

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-16 05:56:45 +00:00
SirSmokeAlot
29305cd948
community: O365Toolkit - send_event - fixed timezone error (#25876)
**Description**: Fixed formatting start and end time
**Issue**: The old formatting resulted everytime in an timezone error
**Dependencies**: /
**Twitter handle**: /

---------

Co-authored-by: Yannick Opitz <yannick.opitz@gob.de>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-16 05:32:28 +00:00
Erick Friis
4f6ccb7080
text-splitters: extended-tests without socket (#28736) 2024-12-16 05:19:50 +00:00
Erick Friis
8ec1c72e03
text-splitters: test without socket (#28732) 2024-12-15 22:10:35 +00:00
Aayush Kataria
d417e4b372
Community: Azure CosmosDB No Sql Vector Store: Full Text and Hybrid Search Support (#28716)
Thank you for contributing to LangChain!

- Added [full
text](https://learn.microsoft.com/en-us/azure/cosmos-db/gen-ai/full-text-search)
and [hybrid
search](https://learn.microsoft.com/en-us/azure/cosmos-db/gen-ai/hybrid-search)
support for Azure CosmosDB NoSql Vector Store
- Added a new enum called CosmosDBQueryType which supports the following
values:
    - VECTOR = "vector"
    - FULL_TEXT_SEARCH = "full_text_search"
    - FULL_TEXT_RANK = "full_text_rank"
    - HYBRID = "hybrid"
- User now needs to provide this query_type to the similarity_search
method for the vectorStore to make the correct query api call.
- Added a couple of work arounds as for the FULL_TEXT_RANK and HYBRID
query functions we don't support parameterized queries right now. I have
added TODO's in place, and will remove these work arounds by end of
January.
- Added necessary test cases and updated the 


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Erick Friis <erickfriis@gmail.com>
2024-12-15 13:26:32 -08:00
Mohammad Mohtashim
4c1871d9a8
community: Passing the model_kwargs correctly while maintaing backward compatability (#28439)
- **Description:** `Model_Kwargs` was not being passed correctly to
`sentence_transformers.SentenceTransformer` which has been corrected
while maintaing backward compatability
- **Issue:** #28436

---------

Co-authored-by: MoosaTae <sadhis.tae@gmail.com>
Co-authored-by: Sadit Wongprayon <101176694+MoosaTae@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-15 20:34:29 +00:00
nhols
a3851cb3bc
community: FAISS vectorstore - consistent Document id field (#28728)
make sure id field of Documents in `FAISS` docstore have the same id as
values in `index_to_docstore_id`, implement `get_by_ids` method
2024-12-15 12:23:49 -08:00
Bagatur
a0534ae62a
community[patch]: Release 0.3.12 (#28725) 2024-12-14 22:13:20 +00:00
Bagatur
089e659e03
langchain[patch]: Release 0.3.12 (#28724) 2024-12-14 20:02:18 +00:00
Bagatur
679e3a9970
text-splitters[patch]: Release 0.3.3 (#28723) 2024-12-14 19:20:22 +00:00
Erick Friis
387284c259
core: release 0.3.25 (#28718) 2024-12-14 02:22:28 +00:00
Nawaf Alharbi
decd77c515
community: fix an issue with deepinfra integration (#28715)
Thank you for contributing to LangChain!

- [x] **PR title**: langchain: add URL parameter to ChatDeepInfra class

- [x] **PR message**: add URL parameter to ChatDeepInfra class
- **Description:** This PR introduces a url parameter to the
ChatDeepInfra class in LangChain, allowing users to specify a custom
URL. Previously, the URL for the DeepInfra API was hardcoded to
"https://stage.api.deepinfra.com/v1/openai/chat/completions", which
caused issues when the staging endpoint was not functional. The _url
method was updated to return the value from the url parameter, enabling
greater flexibility and addressing the problem. out!

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-14 02:15:29 +00:00
Ben Chambers
008efada2c
[community]: Render documents to graphviz (#24830)
- **Description:** Adds a helper that renders documents with the
GraphVectorStore metadata fields to Graphviz for visualization. This is
helpful for understanding and debugging.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-14 02:02:09 +00:00
Erick Friis
288f204758
docs, community: aerospike docs update (#28717)
Co-authored-by: Jesse Schumacher <jschumacher@aerospike.com>
Co-authored-by: Jesse S <jschmidt@aerospike.com>
Co-authored-by: dylan <dwelch@aerospike.com>
2024-12-14 00:27:37 +00:00
Vimpas
337fed80a5
community: 🐛 PDF Filter Type Error (#27154)
Thank you for contributing to LangChain!

 **PR title**: "community: fix  PDF Filter Type Error"


  - **Description:** fix  PDF Filter Type Error"
  - **Issue:** the issue #27153 it fixes,
  - **Dependencies:** no
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!



- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-13 23:30:29 +00:00
Ryan Parker
12111cb922
community: fallback on core async atransform_documents method for MarkdownifyTransformer (#27866)
# Description
Implements the `atransform_documents` method for
`MarkdownifyTransformer` using the `asyncio` built-in library for
concurrency.

Note that this is mainly for API completeness when working with async
frameworks rather than for performance, since the `markdownify` function
is not I/O bound because it works with `Document` objects already in
memory.

# Issue
Fixes #27865

# Dependencies
No new dependencies added, but
[`markdownify`](https://github.com/matthewwithanm/python-markdownify) is
required since this PR updates the `markdownify` integration.

# Tests and docs
- Tests added
- I did not modify the docstrings since they already described the basic
functionality, and [the API docs also already included a
description](https://python.langchain.com/api_reference/community/document_transformers/langchain_community.document_transformers.markdownify.MarkdownifyTransformer.html#langchain_community.document_transformers.markdownify.MarkdownifyTransformer.atransform_documents).
If it would be helpful, I would be happy to update the docstrings and/or
the API docs.

# Lint and test
- [x] format
- [x] lint
- [x] test

I ran formatting with `make format`, linting with `make lint`, and
confirmed that tests pass using `make test`. Note that some unit tests
pass in CI but may fail when running `make_test`. Those unit tests are:
- `test_extract_html` (and `test_extract_html_async`)
- `test_strip_tags` (and `test_strip_tags_async`)
- `test_convert_tags` (and `test_convert_tags_async`)

The reason for the difference is that there are trailing spaces when the
tests are run in the CI checks, and no trailing spaces when run with
`make test`. I ensured that the tests pass in CI, but they may fail with
`make test` due to the addition of trailing spaces.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-13 22:32:22 +00:00
Manuel
af2e0a7ede
partners: add 'model' alias for consistency in embedding classes (#28374)
**Description:** This PR introduces a `model` alias for the embedding
classes that contain the attribute `model_name`, to ensure consistency
across the codebase, as suggested by a moderator in a previous PR. The
change aligns the usage of attribute names across the project (see for
example
[here](65deeddd5d/libs/partners/groq/langchain_groq/chat_models.py (L304))).
**Issue:** This PR addresses the suggestion from the review of issue
#28269.
**Dependencies:**  None

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-13 22:30:00 +00:00
Erick Friis
3107d78517
huggingface: fix standard test lint (#28714) 2024-12-13 22:18:54 +00:00
Kaiwei Zhang
b909d54e70
chroma[patch]: Update logic for assigning ids 2024-12-13 21:58:34 +00:00
Karthik Bharadhwaj
498f0249e2
community[minor]: Opensearch hybridsearch implementation (#25375)
community: add hybrid search in opensearch

# Langchain OpenSearch Hybrid Search Implementation

## Implementation of Hybrid Search: 

I have taken LangChain's OpenSearch integration to the next level by
adding hybrid search capabilities. Building on the existing
OpenSearchVectorSearch class, I have implemented Hybrid Search
functionality (which combines the best of both keyword and semantic
search). This new functionality allows users to harness the power of
OpenSearch's advanced hybrid search features without leaving the
familiar LangChain ecosystem. By blending traditional text matching with
vector-based similarity, the enhanced class delivers more accurate and
contextually relevant results. It's designed to seamlessly fit into
existing LangChain workflows, making it easy for developers to upgrade
their search capabilities.

In implementing the hybrid search for OpenSearch within the LangChain
framework, I also incorporated filtering capabilities. It's important to
note that according to the OpenSearch hybrid search documentation, only
post-filtering is supported for hybrid queries. This means that the
filtering is applied after the hybrid search results are obtained,
rather than during the initial search process.

**Note:** For the implementation of hybrid search, I strictly followed
the official OpenSearch Hybrid search documentation and I took
inspiration from
https://github.com/AndreasThinks/langchain/tree/feature/opensearch_hybrid_search
Thanks Mate!  

### Experiments

I conducted few experiments to verify that the hybrid search
implementation is accurate and capable of reproducing the results of
both plain keyword search and vector search.

Experiment - 1
Hybrid Search
Keyword_weight: 1, vector_weight: 0

I conducted an experiment to verify the accuracy of my hybrid search
implementation by comparing it to a plain keyword search. For this test,
I set the keyword_weight to 1 and the vector_weight to 0 in the hybrid
search, effectively giving full weightage to the keyword component. The
results from this hybrid search configuration matched those of a plain
keyword search, confirming that my implementation can accurately
reproduce keyword-only search results when needed. It's important to
note that while the results were the same, the scores differed between
the two methods. This difference is expected because the plain keyword
search in OpenSearch uses the BM25 algorithm for scoring, whereas the
hybrid search still performs both keyword and vector searches before
normalizing the scores, even when the vector component is given zero
weight. This experiment validates that my hybrid search solution
correctly handles the keyword search component and properly applies the
weighting system, demonstrating its accuracy and flexibility in
emulating different search scenarios.


Experiment - 2
Hybrid Search
keyword_weight = 0.0, vector_weight = 1.0

For experiment-2, I took the inverse approach to further validate my
hybrid search implementation. I set the keyword_weight to 0 and the
vector_weight to 1, effectively giving full weightage to the vector
search component (KNN search). I then compared these results with a pure
vector search. The outcome was consistent with my expectations: the
results from the hybrid search with these settings exactly matched those
from a standalone vector search. This confirms that my implementation
accurately reproduces vector search results when configured to do so. As
with the first experiment, I observed that while the results were
identical, the scores differed between the two methods. This difference
in scoring is expected and can be attributed to the normalization
process in hybrid search, which still considers both components even
when one is given zero weight. This experiment further validates the
accuracy and flexibility of my hybrid search solution, demonstrating its
ability to effectively emulate pure vector search when needed while
maintaining the underlying hybrid search structure.



Experiment - 3
Hybrid Search - balanced

keyword_weight = 0.5, vector_weight = 0.5

For experiment-3, I adopted a balanced approach to further evaluate the
effectiveness of my hybrid search implementation. In this test, I set
both the keyword_weight and vector_weight to 0.5, giving equal
importance to keyword-based and vector-based search components. This
configuration aims to leverage the strengths of both search methods
simultaneously. By setting both weights to 0.5, I intended to create a
scenario where the hybrid search would consider lexical matches and
semantic similarity equally. This balanced approach is often ideal for
many real-world applications, as it can capture both exact keyword
matches and contextually relevant results that might not contain the
exact search terms.

Kindly verify the notebook for the experiments conducted!  

**Notebook:**
https://github.com/karthikbharadhwajKB/Langchain_OpenSearch_Hybrid_search/blob/main/Opensearch_Hybridsearch.ipynb

### Instructions to follow for Performing Hybrid Search:

**Step-1: Instantiating OpenSearchVectorSearch Class:**
```python
opensearch_vectorstore = OpenSearchVectorSearch(
    index_name=os.getenv("INDEX_NAME"),
    embedding_function=embedding_model,
    opensearch_url=os.getenv("OPENSEARCH_URL"),
    http_auth=(os.getenv("OPENSEARCH_USERNAME"),os.getenv("OPENSEARCH_PASSWORD")),
    use_ssl=False,
    verify_certs=False,
    ssl_assert_hostname=False,
    ssl_show_warn=False
)
```

**Parameters:**
1. **index_name:** The name of the OpenSearch index to use.
2. **embedding_function:** The function or model used to generate
embeddings for the documents. It's assumed that embedding_model is
defined elsewhere in the code.
3. **opensearch_url:** The URL of the OpenSearch instance.
4. **http_auth:** A tuple containing the username and password for
authentication.
5. **use_ssl:** Set to False, indicating that the connection to
OpenSearch is not using SSL/TLS encryption.
6. **verify_certs:** Set to False, which means the SSL certificates are
not being verified. This is often used in development environments but
is not recommended for production.
7. **ssl_assert_hostname:** Set to False, disabling hostname
verification in SSL certificates.
8. **ssl_show_warn:** Set to False, suppressing SSL-related warnings.

**Step-2: Configure Search Pipeline:**

To initiate hybrid search functionality, you need to configures a search
pipeline first.

**Implementation Details:**

This method configures a search pipeline in OpenSearch that:
1. Normalizes the scores from both keyword and vector searches using the
min-max technique.
2. Applies the specified weights to the normalized scores.
3. Calculates the final score using an arithmetic mean of the weighted,
normalized scores.


**Parameters:**

* **pipeline_name (str):** A unique identifier for the search pipeline.
It's recommended to use a descriptive name that indicates the weights
used for keyword and vector searches.
* **keyword_weight (float):** The weight assigned to the keyword search
component. This should be a float value between 0 and 1. In this
example, 0.3 gives 30% importance to traditional text matching.
* **vector_weight (float):** The weight assigned to the vector search
component. This should be a float value between 0 and 1. In this
example, 0.7 gives 70% importance to semantic similarity.

```python
opensearch_vectorstore.configure_search_pipelines(
    pipeline_name="search_pipeline_keyword_0.3_vector_0.7",
    keyword_weight=0.3,
    vector_weight=0.7,
)
```

**Step-3: Performing Hybrid Search:**

After creating the search pipeline, you can perform a hybrid search
using the `similarity_search()` method (or) any methods that are
supported by `langchain`. This method combines both `keyword-based and
semantic similarity` searches on your OpenSearch index, leveraging the
strengths of both traditional information retrieval and vector embedding
techniques.

**parameters:**
* **query:** The search query string.
* **k:** The number of top results to return (in this case, 3).
* **search_type:** Set to `hybrid_search` to use both keyword and vector
search capabilities.
* **search_pipeline:** The name of the previously created search
pipeline.

```python
query = "what are the country named in our database?"

top_k = 3

pipeline_name = "search_pipeline_keyword_0.3_vector_0.7"

matched_docs = opensearch_vectorstore.similarity_search_with_score(
                query=query,
                k=top_k,
                search_type="hybrid_search",
                search_pipeline = pipeline_name
            )

matched_docs
```

twitter handle: @iamkarthik98

---------

Co-authored-by: Karthik Kolluri <karthik.kolluri@eidosmedia.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-12-13 16:34:12 -05:00
Philippe PRADOS
f3fb5a9c68
community[minor]: Fix json._validate_metadata_func() (#22842)
JSONparse, in _validate_metadata_func(), checks the consistency of the
_metadata_func() function. To do this, it invokes it and makes sure it
receives a dictionary in response. However, during the call, it does not
respect future calls, as shown on line 100. This generates errors if,
for example, the function is like this:
```python
        def generate_metadata(json_node:Dict[str,Any],kwargs:Dict[str,Any]) -> Dict[str,Any]:
             return {
                "source": url,
                "row": kwargs['seq_num'],
                "question":json_node.get("question"),
            }
        loader = JSONLoader(
            file_path=file_path,
            content_key="answer",
            jq_schema='.[]',
            metadata_func=generate_metadata,
            text_content=False)
```
To avoid this, the verification must comply with the specifications.
This patch does just that.

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-12-13 21:24:20 +00:00
Keiichi Hirobe
67fd554512
core[patch]: throw exception indexing code if deletion fails in vectorstore (#28103)
The delete methods in the VectorStore and DocumentIndex interfaces
return a status indicating the result. Therefore, we can assume that
their implementations don't throw exceptions but instead return a result
indicating whether the delete operations have failed. The current
implementation doesn't check the returned value, so I modified it to
throw an exception when the operation fails.

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-12-13 16:14:27 -05:00
Keiichi Hirobe
258b3be5ec
core[minor]: add new clean up strategy "scoped_full" to indexing (#28505)
~Note that this PR is now Draft, so I didn't add change to `aindex`
function and didn't add test codes for my change.
After we have an agreement on the direction, I will add commits.~

`batch_size` is very difficult to decide because setting a large number
like >10000 will impact VectorDB and RecordManager, while setting a
small number will delete records unnecessarily, leading to redundant
work, as the `IMPORTANT` section says.
On the other hand, we can't use `full` because the loader returns just a
subset of the dataset in our use case.

I guess many people are in the same situation as us.

So, as one of the possible solutions for it, I would like to introduce a
new argument, `scoped_full_cleanup`.
This argument will be valid only when `claneup` is Full. If True, Full
cleanup deletes all documents that haven't been updated AND that are
associated with source ids that were seen during indexing. Default is
False.

This change keeps backward compatibility.

---------

Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-12-13 20:35:25 +00:00
Eugene Yurtsev
ce90b25313
core[patch]: Update error message in indexing code for unreachable code assertion (#28712)
Minor update for error message that should never be triggered
2024-12-13 20:21:14 +00:00
Keiichi Hirobe
da28cf1f54
core[patch]: Reverts PR #25754 and add unit tests (#28702)
I reported the bug 2 weeks ago here:
https://github.com/langchain-ai/langchain/issues/28447

I believe this is a critical bug for the indexer, so I submitted a PR to
revert the change and added unit tests to prevent similar bugs from
being introduced in the future.

@eyurtsev Could you check this?
2024-12-13 15:13:06 -05:00
ScriptShi
b0a298894d
community[minor]: Add TablestoreVectorStore (#25767)
Thank you for contributing to LangChain!

- [x] **PR title**:  community: add TablestoreVectorStore



- [x] **PR message**: 
    - **Description:** add TablestoreVectorStore
    - **Dependencies:** none


- [x] **Add tests and docs**: If you're adding a new integration, please
include
  1. a test for the integration: yes
  2. an example notebook showing its use: yes

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-12-13 11:17:28 -08:00
Erick Friis
86b3c6e81c
community: make old stub for QuerySQLDataBaseTool private to skip api ref (#28711) 2024-12-13 10:43:23 -08:00
Martin Triska
05ebe1e66b
Community: add modified_since argument to O365BaseLoader (#28708)
## What are we doing in this PR
We're adding `modified_since` optional argument to `O365BaseLoader`.
When set, O365 loader will only load documents newer than
`modified_since` datetime.

## Why?
OneDrives / Sharepoints can contain large number of documents. Current
approach is to download and parse all files and let indexer to deal with
duplicates. This can be prohibitively time-consuming. Especially when
using OCR-based parser like
[zerox](fa06188834/libs/community/langchain_community/document_loaders/pdf.py (L948)).
This argument allows to skip documents that are older than known time of
indexing.

_Q: What if a file was modfied during last indexing process?
A: Users can set the `modified_since` conservatively and indexer will
still take care of duplicates._




If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-13 17:30:17 +00:00
Bagatur
fa06188834
community[patch]: fix QuerySQLDatabaseTool name (#28659)
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-12 19:16:03 -08:00
Erick Friis
48ab91b520
docs: more useful vercel warnings (#28699) 2024-12-13 03:07:24 +00:00
Michael Chin
28cb2cefc6
docs: Fix stack diagram in community README (#28685)
- **Description:** The stack diagram illustration in the community
README fails to render due to an invalid branch reference. This PR
replaces the broken image link with a valid one referencing master
branch.
2024-12-12 13:33:50 -08:00
Botong Zhu
13c3c4a210
community: fixes json loader not getting texts with json standard (#27327)
This PR fixes JSONLoader._get_text not converting objects to json string
correctly.
If an object is serializable and is not a dict, JSONLoader will use
python built-in str() method to convert it to string. This may cause
object converted to strings not following json standard. For example, a
list will be converted to string with single quotes, and if json.loads
try to load this string, it will cause error.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-12 19:33:45 +00:00
Lorenzo
4149c0dd8d
community: add method to create branch and list files for gitlab tool (#27883)
### About

- **Description:** In the Gitlab utilities used for the Gitlab tool
there are no methods to create branches, list branches and files, as
this is already done for Github
- **Issue:** None
- **Dependencies:** None

This Pull request add the methods:
- create_branch
- list_branches_in_repo
- set_active_branch
- list_files_in_main_branch
- list_files_in_bot_branch
- list_files_from_directory

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-12 19:11:35 +00:00
Prathamesh Nimkar
ca054ed1b1
community: ChatSnowflakeCortex - Add streaming functionality (#27753)
Description:
snowflake.py
Add _stream and _stream_content methods to enable streaming
functionality
fix pydantic issues and added functionality with the overall langchain
version upgrade
added bind_tools method for agentic workflows support through langgraph
updated the _generate method to account for agentic workflows support
through langgraph
cosmetic changes to comments and if conditions

snowflake.ipynb
Added _stream example
cosmetic changes to comments
fixed lint errors

check_pydantic.sh
Decreased counter from 126 to 125 as suggested when formatting

---------

Co-authored-by: Prathamesh Nimkar <prathamesh.nimkar@snowflake.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-11 18:35:40 -08:00
Wang, Yi
d834c6b618
huggingface: fix tool argument serialization in _convert_TGI_message_to_LC_message (#26075)
Currently `_convert_TGI_message_to_LC_message` replaces `'` in the tool
arguments, so an argument like "It's" will be converted to `It"s` and
could cause a json parser to fail.

---------

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Vadym Barda <vadym@langchain.dev>
2024-12-11 18:34:32 -08:00
Lakindu Boteju
5a31792bf1
community: Add support for cross-region inference profile IDs in Bedrock Anthropic Claude token cost calculation (#28167)
This change modifies the token cost calculation logic to support
cross-region inference profile IDs for Anthropic Claude models. Instead
of explicitly listing all regional variants of new inference profile IDs
in the cost dictionaries, the code now extracts a base model ID from the
input model ID (or inference profile ID), making it more maintainable
and automatically supporting new regional variants.

These inference profile IDs follow the format:
`<region>.<vendor>.<model-name>` (e.g.,
`us.anthropic.claude-3-haiku-xxx`, `eu.anthropic.claude-3-sonnet-xxx`).

Cross-region inference profiles are system-defined identifiers that
enable distributing model inference requests across multiple AWS
regions. They help manage unplanned traffic bursts and enhance
resilience during peak demands without additional routing costs.

References for Amazon Bedrock's cross-region inference profiles:-
-
https://docs.aws.amazon.com/bedrock/latest/userguide/cross-region-inference.html
-
https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles-support.html

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-12 02:33:50 +00:00
fatmelon
d1e0ec7b55
community: VectorStores: Azure Cosmos DB Mongo vCore with DiskANN (#27329)
# Description
Add a new vector index type `diskann` to Azure Cosmos DB Mongo vCore
vector store. Paper of DiskANN can be found here [DiskANN: Fast Accurate
Billion-point Nearest Neighbor Search on a Single
Node](https://proceedings.neurips.cc/paper_files/paper/2019/file/09853c7fb1d3f8ee67a61b6bf4a7f8e6-Paper.pdf).

## Sample Usage
```python
from pymongo import MongoClient

# INDEX_NAME = "izzy-test-index-2"
# NAMESPACE = "izzy_test_db.izzy_test_collection"
# DB_NAME, COLLECTION_NAME = NAMESPACE.split(".")

client: MongoClient = MongoClient(CONNECTION_STRING)
collection = client[DB_NAME][COLLECTION_NAME]

model_deployment = os.getenv(
    "OPENAI_EMBEDDINGS_DEPLOYMENT", "smart-agent-embedding-ada"
)
model_name = os.getenv("OPENAI_EMBEDDINGS_MODEL_NAME", "text-embedding-ada-002")

vectorstore = AzureCosmosDBVectorSearch.from_documents(
    docs,
    openai_embeddings,
    collection=collection,
    index_name=INDEX_NAME,
)

# Read more about these variables in detail here. https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb/vcore/vector-search
maxDegree = 40
dimensions = 1536
similarity_algorithm = CosmosDBSimilarityType.COS
kind = CosmosDBVectorSearchType.VECTOR_DISKANN
lBuild = 20

vectorstore.create_index(
            dimensions=dimensions,
            similarity=similarity_algorithm,
            kind=kind ,
            max_degree=maxDegree,
            l_build=lBuild,
        )
```

## Dependencies
No additional dependencies were added

---------

Co-authored-by: Yang Qiao (from Dev Box) <yangqiao@microsoft.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-12 01:54:04 +00:00
manukychen
ba9b95cd23
Community: Adding bulk_size as a setable param for OpenSearchVectorSearch (#28325)
Description:
When using langchain.retrievers.parent_document_retriever.py with
vectorstore is OpenSearchVectorSearch, I found that the bulk_size param
I passed into OpenSearchVectorSearch class did not work on my
ParentDocumentRetriever.add_documents() function correctly, it will be
overwrite with int 500 the function which OpenSearchVectorSearch class
had (e.g., add_texts(), add_embeddings()...).

So I made this PR requset to fix this, thanks!

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-12 01:45:22 +00:00
xintoteai
45f9c9ae88
langchain: fixed weaviate (v4) vectorstore import for self-query retriever (#28675)
Co-authored-by: Xin Heng <xin.heng@gmail.com>
2024-12-11 15:53:41 -08:00
Thomas van Dongen
ee640d6bd3
community: fixed bug in model2vec embedding code (#28670)
This PR fixes a bug with the current implementation for Model2Vec
embeddings where `embed_documents` does not work as expected.

- **Description**: the current implementation uses `encode_as_sequence`
for encoding documents. This is incorrect, as `encode_as_sequence`
creates token embeddings and not mean embeddings. The normal `encode`
function handles both single and batched inputs and should be used
instead. The return type was also incorrect, as encode returns a NumPy
array. This PR converts the embedding to a list so that the output is
consistent with the Embeddings ABC.
2024-12-11 15:50:56 -08:00
Brian Sharon
b20230c800
community: use correct id_key when deleting by id in LanceDB wrapper (#28655)
- **Description:** The current version of the `delete` method assumes
that the id field will always be called `id`.
- **Issue:** n/a
- **Dependencies:** n/a
- **Twitter handle:** ugh, Twitter :D 

---

Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"


- [x] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-11 23:49:35 +00:00
Mohammad Mohtashim
fa155a422f
[Community]: requests_kwargs not being used in _fetch (#28646)
- **Description:** `requests_kwargs` is not being passed to `_fetch`
which is fetching pages asynchronously. In this PR, making sure that we
are passing `requests_kwargs` to `_fetch` just like `_scrape`.
- **Issue:** #28634

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-11 23:46:54 +00:00
Mohammad Mohtashim
a37afbe353
mistral[minor]: Added Retrying Mechanism in case of Request Rate Limit Error for MistralAIEmbeddings (#27818)
- **Description:**: In the event of a Rate Limit Error from the
MistralAI server, the response JSON raises a KeyError. To address this,
a simple retry mechanism has been implemented to handle cases where the
request limit is exceeded.
  - **Issue:** #27790

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-12-11 17:53:42 -05:00
Vincent Zhang
df5008fe55
community[minor]: FAISS Filter Function Enhancement with Advanced Query Operators (#28207)
## Description
We are submitting as a team of four for a project. Other team members
are @RuofanChen03, @LikeWang10067, @TANYAL77.

This pull requests expands the filtering capabilities of the FAISS
vectorstore by adding MongoDB-style query operators indicated as
follows, while including comprehensive testing for the added
functionality.
- $eq (equals)
- $neq (not equals)
- $gt (greater than)
- $lt (less than)
- $gte (greater than or equal)
- $lte (less than or equal)
- $in (membership in list)
- $nin (not in list)
- $and (all conditions must match)
- $or (any condition must match)
- $not (negation of condition)


## Issue
This closes https://github.com/langchain-ai/langchain/issues/26379.


## Sample Usage
```python
import faiss
import asyncio
from langchain_community.vectorstores import FAISS
from langchain.schema import Document
from langchain_huggingface import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
documents = [
    Document(page_content="Process customer refund request", metadata={"schema_type": "financial", "handler_type": "refund",}),
    Document(page_content="Update customer shipping address", metadata={"schema_type": "customer", "handler_type": "update",}),
    Document(page_content="Process payment transaction", metadata={"schema_type": "financial", "handler_type": "payment",}),
    Document(page_content="Handle customer complaint", metadata={"schema_type": "customer","handler_type": "complaint",}),
    Document(page_content="Process invoice payment", metadata={"schema_type": "financial","handler_type": "payment",})
]

async def search(vectorstore, query, schema_type, handler_type, k=2):
    schema_filter = {"schema_type": {"$eq": schema_type}}
    handler_filter = {"handler_type": {"$eq": handler_type}}
    combined_filter = {
        "$and": [
            schema_filter,
            handler_filter,
        ]
    }
    base_retriever = vectorstore.as_retriever(
        search_kwargs={"k":k, "filter":combined_filter}
    )
    return await base_retriever.ainvoke(query)

async def main():
    vectorstore = FAISS.from_texts(
        texts=[doc.page_content for doc in documents],
        embedding=embeddings,
        metadatas=[doc.metadata for doc in documents]
    )
    
    def printt(title, documents):
        print(title)
        if not documents:
            print("\tNo documents found.")
            return
        for doc in documents:
            print(f"\t{doc.page_content}. {doc.metadata}")

    printt("Documents:", documents)
    printt('\nquery="process payment", schema_type="financial", handler_type="payment":', await search(vectorstore, query="process payment", schema_type="financial", handler_type="payment", k=2))
    printt('\nquery="customer update", schema_type="customer", handler_type="update":', await search(vectorstore, query="customer update", schema_type="customer", handler_type="update", k=2))
    printt('\nquery="refund process", schema_type="financial", handler_type="refund":', await search(vectorstore, query="refund process", schema_type="financial", handler_type="refund", k=2))
    printt('\nquery="refund process", schema_type="financial", handler_type="foobar":', await search(vectorstore, query="refund process", schema_type="financial", handler_type="foobar", k=2))
    print()

if __name__ == "__main__":asyncio.run(main())
```

## Output
```
Documents:
	Process customer refund request. {'schema_type': 'financial', 'handler_type': 'refund'}
	Update customer shipping address. {'schema_type': 'customer', 'handler_type': 'update'}
	Process payment transaction. {'schema_type': 'financial', 'handler_type': 'payment'}
	Handle customer complaint. {'schema_type': 'customer', 'handler_type': 'complaint'}
	Process invoice payment. {'schema_type': 'financial', 'handler_type': 'payment'}

query="process payment", schema_type="financial", handler_type="payment":
	Process payment transaction. {'schema_type': 'financial', 'handler_type': 'payment'}
	Process invoice payment. {'schema_type': 'financial', 'handler_type': 'payment'}

query="customer update", schema_type="customer", handler_type="update":
	Update customer shipping address. {'schema_type': 'customer', 'handler_type': 'update'}

query="refund process", schema_type="financial", handler_type="refund":
	Process customer refund request. {'schema_type': 'financial', 'handler_type': 'refund'}

query="refund process", schema_type="financial", handler_type="foobar":
	No documents found.

```

---------

Co-authored-by: ruofan chen <ruofan.is.awesome@gmail.com>
Co-authored-by: RickyCowboy <like.wang@mail.utoronto.ca>
Co-authored-by: Shanni Li <tanya.li@mail.utoronto.ca>
Co-authored-by: RuofanChen03 <114096642+ruofanchen03@users.noreply.github.com>
Co-authored-by: Like Wang <102838708+likewang10067@users.noreply.github.com>
2024-12-11 17:52:22 -05:00
like
3048a9a26d
community: tongyi multimodal response format fix to support langchain (#28645)
Description: The multimodal(tongyi) response format "message": {"role":
"assistant", "content": [{"text": "图像"}]}}]} is not compatible with
LangChain.
Dependencies: No

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-10 21:13:26 +00:00
Bagatur
d0e662e43b
community[patch]: Release 0.3.11 (#28658) 2024-12-10 20:51:13 +00:00
Bagatur
91227ad7fd
langchain[patch]: Release 0.3.11 (#28657) 2024-12-10 12:28:14 -08:00
Bagatur
1fbd86a155
core[patch]: Release 0.3.24 (#28656) 2024-12-10 20:19:21 +00:00
Bagatur
e6a62d8422
core,langchain,community[patch]: allow langsmith 0.2 (#28598) 2024-12-10 18:50:58 +00:00
ccurme
bc4dc7f4b1
ollama[patch]: permit streaming for tool calls (#28654)
Resolves https://github.com/langchain-ai/langchain/issues/28543

Ollama recently
[released](https://github.com/ollama/ollama/releases/tag/v0.4.6) support
for streaming tool calls. Previously we would override the `stream`
parameter if tools were passed in.

Covered in standard tests here:
c1d348e95d/libs/standard-tests/langchain_tests/integration_tests/chat_models.py (L893-L897)

Before, the test generates one message chunk:
```python
[
    AIMessageChunk(
        content='',
        additional_kwargs={},
        response_metadata={
            'model': 'llama3.1',
            'created_at': '2024-12-10T17:49:04.468487Z',
            'done': True,
            'done_reason': 'stop',
            'total_duration': 525471208,
            'load_duration': 19701000,
            'prompt_eval_count': 170,
            'prompt_eval_duration': 31000000,
            'eval_count': 17,
            'eval_duration': 473000000,
            'message': Message(
                role='assistant',
                content='',
                images=None,
                tool_calls=[
                    ToolCall(
                        function=Function(name='magic_function', arguments={'input': 3})
                    )
                ]
            )
        },
        id='run-552bbe0f-8fb2-4105-ada1-fa38c1db444d',
        tool_calls=[
            {
                'name': 'magic_function',
                'args': {'input': 3},
                'id': 'b0a4dc07-7d7a-487b-bd7b-ad062c2363a2',
                'type': 'tool_call',
            },
        ],
        usage_metadata={
            'input_tokens': 170, 'output_tokens': 17, 'total_tokens': 187
        },
        tool_call_chunks=[
            {
                'name': 'magic_function',
                'args': '{"input": 3}',
                'id': 'b0a4dc07-7d7a-487b-bd7b-ad062c2363a2',
                'index': None,
                'type': 'tool_call_chunk',
            }
        ]
    )
]
```

After, it generates two (tool call in one, response metadata in
another):
```python
[
    AIMessageChunk(
        content='',
        additional_kwargs={},
        response_metadata={},
        id='run-9a3f0860-baa1-4bae-9562-13a61702de70',
        tool_calls=[
            {
                'name': 'magic_function',
                'args': {'input': 3},
                'id': '5bbaee2d-c335-4709-8d67-0783c74bd2e0',
                'type': 'tool_call',
            },
        ],
        tool_call_chunks=[
            {
                'name': 'magic_function',
                'args': '{"input": 3}',
                'id': '5bbaee2d-c335-4709-8d67-0783c74bd2e0',
                'index': None,
                'type': 'tool_call_chunk',
            },
        ],
    ),
    AIMessageChunk(
        content='',
        additional_kwargs={},
        response_metadata={
            'model': 'llama3.1',
            'created_at': '2024-12-10T17:46:43.278436Z',
            'done': True,
            'done_reason': 'stop',
            'total_duration': 514282750,
            'load_duration': 16894458,
            'prompt_eval_count': 170,
            'prompt_eval_duration': 31000000,
            'eval_count': 17,
            'eval_duration': 464000000,
            'message': Message(
                role='assistant', content='', images=None, tool_calls=None
            ),
        },
        id='run-9a3f0860-baa1-4bae-9562-13a61702de70',
        usage_metadata={
            'input_tokens': 170, 'output_tokens': 17, 'total_tokens': 187
        }
    ),
]
```
2024-12-10 12:54:37 -05:00
Johannes Mohren
c1d348e95d
doc-loader: retain Azure Doc Intelligence API metadata in Document parser (#28382)
**Description**:
This PR modifies the doc_intelligence.py parser in the community package
to include all metadata returned by the Azure Doc Intelligence API in
the Document object. Previously, only the parsed content (markdown) was
retained, while other important metadata such as bounding boxes (bboxes)
for images and tables was discarded. These image bboxes are crucial for
supporting use cases like multi-modal RAG workflows when using Azure Doc
Intelligence.

The change ensures that all information returned by the Azure Doc
Intelligence API is preserved by setting the metadata attribute of the
Document object to the entire result returned by the API, rather than an
empty dictionary. This extends the parser's utility for complex use
cases without breaking existing functionality.

**Issue**:
This change does not address a specific issue number, but it resolves a
critical limitation in supporting multimodal workflows when using the
LangChain wrapper for the Azure API.

**Dependencies**:
No additional dependencies are required for this change.

---------

Co-authored-by: jmohren <johannes.mohren@aol.de>
2024-12-10 11:22:58 -05:00
Alex Tonkonozhenko
0d20c314dd
Confluence Loader: Fix CQL loading (#27620)
fix #12082

<!---
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
-->
2024-12-10 11:05:23 -05:00
Katarina Supe
aba2711e7f
community: update Memgraph integration (#27017)
**Description:**
- **Memgraph** no longer relies on `Neo4jGraphStore` but **implements
`GraphStore`**, just like other graph databases.
- **Memgraph** no longer relies on `GraphQAChain`, but implements
`MemgraphQAChain`, just like other graph databases.
- The refresh schema procedure has been updated to try using `SHOW
SCHEMA INFO`. The fallback uses Cypher queries (a combination of schema
and Cypher) → **LangChain integration no longer relies on MAGE
library**.
- The **schema structure** has been reformatted. Regardless of the
procedures used to get schema, schema structure is the same.
- The `add_graph_documents()` method has been implemented. It transforms
`GraphDocument` into Cypher queries and creates a graph in Memgraph. It
implements the ability to use `baseEntityLabel` to improve speed
(`baseEntityLabel` has an index on the `id` property). It also
implements the ability to include sources by creating a `MENTIONS`
relationship to the source document.
- Jupyter Notebook for Memgraph has been updated.
- **Issue:** /
- **Dependencies:** /
- **Twitter handle:** supe_katarina (DX Engineer @ Memgraph)

Closes #25606
2024-12-10 10:57:21 -05:00