Commit Graph

6011 Commits

Author SHA1 Message Date
Changyong Um
dc171221b3
community[patch]: Fix vLLM integration to apply lora_request (#27731)
**Description:**
- Add the `lora_request` parameter to the VLLM class to support LoRA
model configurations. This enhancement allows users to specify LoRA
requests directly when using VLLM, enabling more flexible and efficient
model customization.

**Issue:**
- No existing issue for `lora_adapter` in VLLM. This PR addresses the
need for configuring LoRA requests within the VLLM framework.
- Reference : [Using LoRA Adapters in
vLLM](https://docs.vllm.ai/en/stable/models/lora.html#using-lora-adapters)


**Example Code :**
Before this change, the `lora_request` parameter was not applied
correctly:

```python
ADAPTER_PATH = "/path/of/lora_adapter"

llm = VLLM(model="Bllossom/llama-3.2-Korean-Bllossom-3B",
           max_new_tokens=512,
           top_k=2,
           top_p=0.90,
           temperature=0.1,
           vllm_kwargs={
               "gpu_memory_utilization":0.5, 
               "enable_lora":True, 
               "max_model_len":1024,
           }
)

print(llm.invoke(
    ["...prompt_content..."], 
    lora_request=LoRARequest("lora_adapter", 1, ADAPTER_PATH)
    ))
```
**Before Change Output:**
```bash
response was not applied lora_request
```
So, I attempted to apply the lora_adapter to
langchain_community.llms.vllm.VLLM.

**current output:**
```bash
response applied lora_request
```

**Dependencies:**
- None

**Lint and test:**
- All tests and lint checks have passed.

---------

Co-authored-by: Um Changyong <changyong.um@sfa.co.kr>
2024-10-30 13:59:34 +00:00
Qier LU
8d8e38b090
community[pathch]: Add missing custom content_key handling in Redis vector store (#27736)
This fix an error caused by missing custom content_key handling in Redis
vector store in function similarity_search_with_score.
2024-10-30 13:57:20 +00:00
William FH
5a2cfb49e0
Support message trimming on single messages (#27729)
Permit trimming message lists of length 1
2024-10-30 04:27:52 +00:00
Bagatur
5111063af2
langchain[patch]: Release 0.3.5 (#27727) 2024-10-29 17:06:23 -07:00
Bagatur
8f4423e042
text-splitters[patch]: Release 0.3.1 (#27726) 2024-10-30 00:04:48 +00:00
Harsimran-19
c1d8c33df6
core: JsonOutputParser UTF characters bug (#27306)
**Description:**
This PR fixes an issue where non-ASCII characters in Pydantic field
descriptions were being escaped to their Unicode representations when
using `JsonOutputParser`. The change allows non-ASCII characters to be
preserved in the output, which is especially important for multilingual
support and when working with non-English languages.

**Issue:** Fixes #27256

**Example Code:**
```python
from pydantic import BaseModel, Field
from langchain_core.output_parsers import JsonOutputParser

class Article(BaseModel):
    title: str = Field(description="科学文章的标题")

output_data_structure = Article
parser = JsonOutputParser(pydantic_object=output_data_structure)
print(parser.get_format_instructions())
```
**Previous Output**:
```... "title": {"description": "\\u79d1\\u5b66\\u6587\\u7ae0\\u7684\\u6807\\u9898", "title": "Title", "type": "string"}} ...```

**Current Output**:
```... "title": {"description": "科学文章的标题", "title": "Title", "type":
"string"}} ...```

**Changes made**:
- Modified `json.dumps()` call in
`langchain_core/output_parsers/json.py` to use `ensure_ascii=False`
- Added a unit test to verify Unicode handling

Co-authored-by: Harsimran-19 <harsimran1869@gmail.com>
2024-10-29 14:48:53 +00:00
Andrew Effendi
49517cc1e7
partners/huggingface[patch]: fix HuggingFacePipeline model_id parameter (#27514)
**Description:** Fixes issue with model parameter not getting
initialized correctly when passing transformers pipeline
**Issue:** https://github.com/langchain-ai/langchain/issues/25915
2024-10-29 14:34:46 +00:00
Jeong-Minju
0a465b8032
docs: Fix typo in _action_agent docs section (#27698)
PR Title: docs: Fix typo in _action_agent function docs section

Description: In line 1185, _action_agent function's docs, changing
**".agent"** to **"self.agent"**.

Issue: N/A

Dependencies: None

---------

Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
2024-10-29 14:16:42 +00:00
Neil Vachharajani
eec35672a4
core[patch]: Improve type checking for the tool decorator (#27460)
**Description:**

When annotating a function with the @tool decorator, the symbol should
have type BaseTool. The previous type annotations did not convey that to
type checkers. This patch creates 4 overloads for the tool function for
the 4 different use cases.

1. @tool decorator with no arguments
2. @tool decorator with only keyword arguments
3. @tool decorator with a name argument (and possibly keyword arguments)
4. Invoking tool as function with a name and runnable positional
arguments

The main function is updated to match the overloads. The changes are
100% backwards compatible (all existing calls should continue to work,
just with better type annotations).

**Twitter handle:** @nvachhar

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-10-29 13:59:56 +00:00
Erick Friis
583808a7b8
partners/huggingface: release 0.1.1 (#27691) 2024-10-28 13:39:38 -07:00
Erick Friis
6d524e9566
partners/box: release 0.2.2 (#27690) 2024-10-28 12:54:20 -07:00
yahya-mouman
6803cb4f34
openai[patch]: add check for none values when summing token usage (#27585)
**Description:** Fixes None addition issues when an empty value is
passed on

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2024-10-28 12:49:43 -07:00
Bagatur
ede953d617
openai[patch]: fix schema formatting util (#27685) 2024-10-28 15:46:47 +00:00
Baptiste Pasquier
440c162b8b
community: Fix closed session in Infinity (#26933)
**Description:** 

The `aiohttp.ClientSession` is closed at the end of the with statement,
which causes an error during a second call.

The implemented fix is to define the session directly within the with
block, exactly like in the textembed code:


c6350d636e/libs/community/langchain_community/embeddings/textembed.py (L335-L346)
 
**Issue:** Fix #26932

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-10-27 11:37:21 -04:00
Jorge Piedrahita Ortiz
8895d468cb
community: sambastudio llm refactor (#27215)
**Description:** 
    - Sambastudio LLM refactor 
    - Sambastudio openai compatible API support added
    - docs updated
2024-10-27 11:08:15 -04:00
ccurme
fe87e411f2
groq: fix unit test (#27660) 2024-10-26 14:57:23 -04:00
Erick Friis
fbfc6bdade
core: test runner improvements (#27654)
when running core tests locally this
- prevents langsmith tracing from being enabled by env vars
- prevents network calls
2024-10-25 15:06:59 -07:00
Vincent Min
7bc4e320f1
core[patch]: improve performance of InMemoryVectorStore (#27538)
**Description:** We improve the performance of the InMemoryVectorStore.
**Isue:** Originally, similarity was computed document by document:
```
for doc in self.store.values():
            vector = doc["vector"]
            similarity = float(cosine_similarity([embedding], [vector]).item(0))
```
This is inefficient and does not make use of numpy vectorization.
This PR computes the similarity in one vectorized go:
```
docs = list(self.store.values())
similarity = cosine_similarity([embedding], [doc["vector"] for doc in docs])
```
**Dependencies:** None
**Twitter handle:** @b12_consulting, @Vincent_Min

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-10-25 17:07:04 -04:00
Bagatur
d5306899d3
openai[patch]: Release 0.2.4 (#27652) 2024-10-25 20:26:21 +00:00
Erick Friis
600b7bdd61
all: test 3.13 ci (#27197)
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-10-25 12:56:58 -07:00
Bagatur
06df15c9c0
core[patch]: Release 0.3.13 (#27651) 2024-10-25 19:22:44 +00:00
Steve Moss
24605bcdb6
community[patch]: Fix missing protected_namespaces(). (#27610)
- [x] **PR message**:
- **Description:** Fixes warning messages raised due to missing
`protected_namespaces` parameter in `ConfigDict`.
    - **Issue:** https://github.com/langchain-ai/langchain/issues/27609
    - **Dependencies:** No dependencies
    - **Twitter handle:** @gawbul
2024-10-25 02:16:26 +00:00
Eugene Yurtsev
7667ee126f
core: remove mustache in extended deps (#27629)
Remove mustache from extended deps -- we vendor the mustache
implementation
2024-10-24 22:12:49 -04:00
Erick Friis
265e0a164a
core: add flake8-bandit (S) ruff rules to core (#27368)
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-10-24 22:33:41 +00:00
Nithish Raghunandanan
0623c74560
couchbase: Add document id to vector search results (#27622)
**Description:** Returns the document id along with the Vector Search
results

**Issue:** Fixes https://github.com/langchain-ai/langchain/issues/26860
for CouchbaseVectorStore


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified.

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-24 21:47:36 +00:00
ZhangShenao
455ab7d714
Improvement[Community] Improve Document Loaders and Splitters (#27568)
- Fix word spelling error
- Add static method decorator
- Fix language splitter

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-24 21:42:16 +00:00
CLOVA Studio 개발
846a75284f
community: Add Naver chat model & embeddings (#25162)
Reopened as a personal repo outside the organization.

## Description
- Naver HyperCLOVA X community package 
  - Add chat model & embeddings
  - Add unit test & integration test
  - Add chat model & embeddings docs
- I changed partner
package(https://github.com/langchain-ai/langchain/pull/24252) to
community package on this PR
- Could this
embeddings(https://github.com/langchain-ai/langchain/pull/21890) be
deprecated? We are trying to replace it with embedding
model(**ClovaXEmbeddings**) in this PR.

Twitter handle: None. (if needed, contact with
joonha.jeon@navercorp.com)

---
you can check our previous discussion below:

> one question on namespaces - would it make sense to have these in
.clova namespaces instead of .naver?

I would like to keep it as is, unless it is essential to unify the
package name.
(ClovaX is a branding for the model, and I plan to add other models and
components. They need to be managed as separate classes.)

> also, could you clarify the difference between ClovaEmbeddings and
ClovaXEmbeddings?

There are 3 models that are being serviced by embedding, and all are
supported in the current PR. In addition, all the functionality of CLOVA
Studio that serves actual models, such as distinguishing between test
apps and service apps, is supported. The existing PR does not support
this content because it is hard-coded.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Vadym Barda <vadym@langchain.dev>
2024-10-24 20:54:13 +00:00
Hyejun An
6227396e20
partners/HuggingFacePipeline[stream]: Change to use pipeline instead of pipeline.model.generate in stream() (#26531)
## Description

I encountered an error while using the` gemma-2-2b-it model` with the
`HuggingFacePipeline` class and have implemented a fix to resolve this
issue.

### What is Problem

```python
model_id="google/gemma-2-2b-it"


gemma_2_model = AutoModelForCausalLM.from_pretrained(model_id)
gemma_2_tokenizer = AutoTokenizer.from_pretrained(model_id)

gen = pipeline( 
    task='text-generation',
    model=gemma_2_model,
    tokenizer=gemma_2_tokenizer,
    max_new_tokens=1024,
    device=0 if torch.cuda.is_available() else -1,
    temperature=.5,
    top_p=0.7,
    repetition_penalty=1.1,
    do_sample=True,
    )

llm = HuggingFacePipeline(pipeline=gen)

for chunk in llm.stream("Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World."):
    print(chunk, end="", flush=True)
```

This code outputs the following error message:

```
/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:1258: UserWarning: Using the model-agnostic default `max_length` (=20) to control the generation length. We recommend setting `max_new_tokens` to control the maximum length of the generation.
  warnings.warn(
Exception in thread Thread-19 (generate):
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1874, in generate
    self._validate_generated_length(generation_config, input_ids_length, has_default_max_length)
  File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1266, in _validate_generated_length
    raise ValueError(
ValueError: Input length of input_ids is 31, but `max_length` is set to 20. This can lead to unexpected behavior. You should consider increasing `max_length` or, better yet, setting `max_new_tokens`.
```

In addition, the following error occurs when the number of tokens is
reduced.

```python
for chunk in llm.stream("Hello World"):
    print(chunk, end="", flush=True)
```

```
/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:1258: UserWarning: Using the model-agnostic default `max_length` (=20) to control the generation length. We recommend setting `max_new_tokens` to control the maximum length of the generation.
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:1885: UserWarning: You are calling .generate() with the `input_ids` being on a device type different than your model's device. `input_ids` is on cpu, whereas the model is on cuda. You may experience unexpected behaviors or slower generation. Please make sure that you have put `input_ids` to the correct device by calling for example input_ids = input_ids.to('cuda') before running `.generate()`.
  warnings.warn(
Exception in thread Thread-20 (generate):
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2024, in generate
    result = self._sample(
  File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2982, in _sample
    outputs = self(**model_inputs, return_dict=True)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/gemma2/modeling_gemma2.py", line 994, in forward
    outputs = self.model(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/gemma2/modeling_gemma2.py", line 803, in forward
    inputs_embeds = self.embed_tokens(input_ids)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/sparse.py", line 164, in forward
    return F.embedding(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py", line 2267, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)
```

On the other hand, in the case of invoke, the output is normal:

```
llm.invoke("Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World.")
```
```
'Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World.\n\nThis is a simple program that prints the phrase "Hello World" to the console. \n\n**Here\'s how it works:**\n\n* **`print("Hello World")`**: This line of code uses the `print()` function, which is a built-in function in most programming languages (like Python). The `print()` function takes whatever you put inside its parentheses and displays it on the screen.\n* **`"Hello World"`**:  The text within the double quotes (`"`) is called a string. It represents the message we want to print.\n\n\nLet me know if you\'d like to explore other programming concepts or see more examples! \n'
```

### Problem Analysis

- Apparently, I put kwargs in while generating pipelines and it applied
to `invoke()`, but it's not applied in the `stream()`.
- When using the stream, `inputs = self.pipeline.tokenizer (prompt,
return_tensors = "pt")` enters cpu.
  - This can crash when the model is in gpu.

### Solution

Just use `self.pipeline` instead of `self.pipeline.model.generate`.

- **Original Code**

```python
stopping_criteria = StoppingCriteriaList([StopOnTokens()])

inputs = self.pipeline.tokenizer(prompt, return_tensors="pt")
streamer = TextIteratorStreamer(
    self.pipeline.tokenizer,
    timeout=60.0,
    skip_prompt=skip_prompt,
    skip_special_tokens=True,
)
generation_kwargs = dict(
    inputs,
    streamer=streamer,
    stopping_criteria=stopping_criteria,
    **pipeline_kwargs,
)
t1 = Thread(target=self.pipeline.model.generate, kwargs=generation_kwargs)
t1.start()
```

- **Updated Code**

```python
stopping_criteria = StoppingCriteriaList([StopOnTokens()])

streamer = TextIteratorStreamer(
    self.pipeline.tokenizer,
    timeout=60.0,
    skip_prompt=skip_prompt,
    skip_special_tokens=True,
)
generation_kwargs = dict(
    text_inputs= prompt,
    streamer=streamer,
    stopping_criteria=stopping_criteria,
    **pipeline_kwargs,
)
t1 = Thread(target=self.pipeline, kwargs=generation_kwargs)
t1.start()
```

By using the `pipeline` directly, the `kwargs` of the pipeline are
applied, and there is no need to consider the `device` of the `tensor`
made with the `tokenizer`.

> According to the change to use `pipeline`, it was modified to put
`text_inputs=prompts` directly into `generation_kwargs`.

## Issue

None

## Dependencies

None

## Twitter handle

None

---------

Co-authored-by: Vadym Barda <vadym@langchain.dev>
2024-10-24 16:49:43 -04:00
Bagatur
655ced84d7
openai[patch]: accept json schema response format directly (#27623)
fix #25460

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-24 18:19:15 +00:00
Tibor Reiss
20b56a0233
core[patch]: fix repr and str for Serializable (#26786)
Fixes #26499

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-10-24 08:36:35 -07:00
Lei Zhang
f203229b51
community: Fix the failure of ChatSparkLLM after upgrading to Pydantic V2 (#27418)
**Description:**

The test_sparkllm.py can reproduce this issue.


https://github.com/langchain-ai/langchain/blob/master/libs/community/tests/integration_tests/chat_models/test_sparkllm.py#L66

```
Testing started at 18:27 ...
Launching pytest with arguments test_sparkllm.py::test_chat_spark_llm --no-header --no-summary -q in /Users/zhanglei/Work/github/langchain/libs/community/tests/integration_tests/chat_models

============================= test session starts ==============================
collecting ... collected 1 item

test_sparkllm.py::test_chat_spark_llm 

============================== 1 failed in 0.45s ===============================
FAILED                             [100%]
tests/integration_tests/chat_models/test_sparkllm.py:65 (test_chat_spark_llm)
def test_chat_spark_llm() -> None:
>       chat = ChatSparkLLM(
            spark_app_id="your spark_app_id",
            spark_api_key="your spark_api_key",
            spark_api_secret="your spark_api_secret",
        )  # type: ignore[call-arg]

test_sparkllm.py:67: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../../../core/langchain_core/load/serializable.py:111: in __init__
    super().__init__(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

cls = <class 'langchain_community.chat_models.sparkllm.ChatSparkLLM'>
values = {'spark_api_key': 'your spark_api_key', 'spark_api_secret': 'your spark_api_secret', 'spark_api_url': 'wss://spark-api.xf-yun.com/v3.5/chat', 'spark_app_id': 'your spark_app_id', ...}

    @model_validator(mode="before")
    @classmethod
    def validate_environment(cls, values: Dict) -> Any:
        values["spark_app_id"] = get_from_dict_or_env(
            values,
            ["spark_app_id", "app_id"],
            "IFLYTEK_SPARK_APP_ID",
        )
        values["spark_api_key"] = get_from_dict_or_env(
            values,
            ["spark_api_key", "api_key"],
            "IFLYTEK_SPARK_API_KEY",
        )
        values["spark_api_secret"] = get_from_dict_or_env(
            values,
            ["spark_api_secret", "api_secret"],
            "IFLYTEK_SPARK_API_SECRET",
        )
        values["spark_api_url"] = get_from_dict_or_env(
            values,
            "spark_api_url",
            "IFLYTEK_SPARK_API_URL",
            SPARK_API_URL,
        )
        values["spark_llm_domain"] = get_from_dict_or_env(
            values,
            "spark_llm_domain",
            "IFLYTEK_SPARK_LLM_DOMAIN",
            SPARK_LLM_DOMAIN,
        )
    
        # put extra params into model_kwargs
        default_values = {
            name: field.default
            for name, field in get_fields(cls).items()
            if field.default is not None
        }
>       values["model_kwargs"]["temperature"] = default_values.get("temperature")
E       KeyError: 'model_kwargs'

../../../langchain_community/chat_models/sparkllm.py:368: KeyError
``` 

I found that when upgrading to Pydantic v2, @root_validator was changed
to @model_validator. When a class declares multiple
@model_validator(model=before), the execution order in V1 and V2 is
opposite. This is the reason for ChatSparkLLM's failure.

The correct execution order is to execute build_extra first.


https://github.com/langchain-ai/langchain/blob/langchain%3D%3D0.2.16/libs/community/langchain_community/chat_models/sparkllm.py#L302

And then execute validate_environment.


https://github.com/langchain-ai/langchain/blob/langchain%3D%3D0.2.16/libs/community/langchain_community/chat_models/sparkllm.py#L329

The Pydantic community also discusses it, but there hasn't been a
conclusion yet. https://github.com/pydantic/pydantic/discussions/7434

**Issus:** #27416 

**Twitter handle:** coolbeevip

---------

Co-authored-by: vbarda <vadym@langchain.dev>
2024-10-23 21:17:10 -04:00
Andrew Effendi
8f151223ad
Community: Fix DuckDuckGo search tool Output Format (#27479)
**Issue:** : https://github.com/langchain-ai/langchain/issues/22961
   **Description:** 

Previously, the documentation for `DuckDuckGoSearchResults` said that it
returns a JSON string, however the code returns a regular string that
can't be parsed as is.
for example running

```python
from langchain_community.tools import DuckDuckGoSearchResults

# Create a DuckDuckGo search instance
search = DuckDuckGoSearchResults()

# Invoke the search
result = search.invoke("Obama")

# Print the result
print(result)
# Print the type of the result
print("Result Type:", type(result))
```
will return
```
snippet: Harris will hold a campaign event with former President Barack Obama in Georgia next Thursday, the first time the pair has campaigned side by side, a senior campaign official said. A week from ..., title: Obamas to hit the campaign trail in first joint appearances with Harris, link: https://www.nbcnews.com/politics/2024-election/obamas-hit-campaign-trail-first-joint-appearances-harris-rcna176034, snippet: Item 1 of 3 Former U.S. first lady Michelle Obama and her husband, former U.S. President Barack Obama, stand on stage during Day 2 of the Democratic National Convention (DNC) in Chicago, Illinois ..., title: Obamas set to hit campaign trail with Kamala Harris for first time, link: https://www.reuters.com/world/us/obamas-set-hit-campaign-trail-with-kamala-harris-first-time-2024-10-18/, snippet: Barack and Michelle Obama will make their first campaign appearances alongside Kamala Harris at rallies in Georgia and Michigan. By Reid J. Epstein Reporting from Ashwaubenon, Wis. Here come the ..., title: Harris Will Join Michelle Obama and Barack Obama on Campaign Trail, link: https://www.nytimes.com/2024/10/18/us/politics/kamala-harris-michelle-obama-barack-obama.html, snippet: Obama's leaving office was "a turning point," Mirsky said. "That was the last time anybody felt normal." A few feet over, a 64-year-old physics professor named Eric Swanson who had grown ..., title: Obama's reemergence on the campaign trail for Harris comes as he ..., link: https://www.cnn.com/2024/10/13/politics/obama-campaign-trail-harris-biden/index.html
Result Type: <class 'str'>
```

After the change in this PR, `DuckDuckGoSearchResults` takes an
additional `output_format = "list" | "json" | "string"` ("string" =
current behavior, default). For example, invoking
`DuckDuckGoSearchResults(output_format="list")` return a list of
dictionaries in the format
```
[{'snippet': '...', 'title': '...', 'link': '...'}, ...]
```
e.g.

```
[{'snippet': "Obama has in a sense been wrestling with Trump's impact since the real estate magnate broke onto the political stage in 2015. Trump's victory the next year, defeating Obama's secretary of ...", 'title': "Obama's fears about Trump drive his stepped-up campaigning", 'link': 'https://www.washingtonpost.com/politics/2024/10/18/obama-trump-anxiety-harris-campaign/'}, {'snippet': 'Harris will hold a campaign event with former President Barack Obama in Georgia next Thursday, the first time the pair has campaigned side by side, a senior campaign official said. A week from ...', 'title': 'Obamas to hit the campaign trail in first joint appearances with Harris', 'link': 'https://www.nbcnews.com/politics/2024-election/obamas-hit-campaign-trail-first-joint-appearances-harris-rcna176034'}, {'snippet': 'Item 1 of 3 Former U.S. first lady Michelle Obama and her husband, former U.S. President Barack Obama, stand on stage during Day 2 of the Democratic National Convention (DNC) in Chicago, Illinois ...', 'title': 'Obamas set to hit campaign trail with Kamala Harris for first time', 'link': 'https://www.reuters.com/world/us/obamas-set-hit-campaign-trail-with-kamala-harris-first-time-2024-10-18/'}, {'snippet': 'Barack and Michelle Obama will make their first campaign appearances alongside Kamala Harris at rallies in Georgia and Michigan. By Reid J. Epstein Reporting from Ashwaubenon, Wis. Here come the ...', 'title': 'Harris Will Join Michelle Obama and Barack Obama on Campaign Trail', 'link': 'https://www.nytimes.com/2024/10/18/us/politics/kamala-harris-michelle-obama-barack-obama.html'}]
Result Type: <class 'list'>
```

---------

Co-authored-by: vbarda <vadym@langchain.dev>
2024-10-23 20:18:11 -04:00
Bagatur
968dccee04
core[patch]: convert_to_openai_tool Anthropic support (#27591) 2024-10-23 12:27:06 -07:00
Bagatur
217de4e6a6
langchain[patch]: de-beta init_chat_model (#27558) 2024-10-23 08:35:15 -07:00
Kwan Kin Chan
6d2a76ac05
langchain_huggingface: Fix multiple GPU usage bug in from_model_id function (#23628)
- [ ]  **Description:**   
   - pass the device_map into model_kwargs 
- removing the unused device_map variable in the hf_pipeline function
call
- [ ] **Issue:** issue #13128 
When using the from_model_id function to load a Hugging Face model for
text generation across multiple GPUs, the model defaults to loading on
the CPU despite multiple GPUs being available using the expected format
``` python
llm = HuggingFacePipeline.from_model_id(
    model_id="model-id",
    task="text-generation",
    device_map="auto",
)
```
Currently, to enable multiple GPU , we have to pass in variable in this
format instead
``` python
llm = HuggingFacePipeline.from_model_id(
    model_id="model-id",
    task="text-generation",
    device=None,
    model_kwargs={
        "device_map": "auto",
    }
)
```
This issue arises due to improper handling of the device and device_map
parameters.

- [ ] **Explanation:**
1. In from_model_id, the model is created using model_kwargs and passed
as the model variable of the pipeline function. So at this moment, to
load the model with multiple GPUs, "device_map" needs to be set to
"auto" within model_kwargs. Otherwise, the model defaults to loading on
the CPU.
2. The device_map variable in from_model_id is not utilized correctly.
In the pipeline function's source code of tnansformer:
- The device_map variable is stored in the model_kwargs dictionary
(lines 867-878 of transformers/src/transformers/pipelines/\__init__.py).
```python
    if device_map is not None:
        ......
        model_kwargs["device_map"] = device_map
```
- The model is constructed with model_kwargs containing the device_map
value ONLY IF it is a string (lines 893-903 of
transformers/src/transformers/pipelines/\__init__.py).
```python
    if isinstance(model, str) or framework is None:
        model_classes = {"tf": targeted_task["tf"], "pt": targeted_task["pt"]}
        framework, model = infer_framework_load_model( ... , **model_kwargs, )
```
- Consequently, since a model object is already passed to the pipeline
function, the device_map variable from from_model_id is never used.

3. The device_map variable in from_model_id not only appears unused but
also causes errors. Without explicitly setting device=None, attempting
to load the model on multiple GPUs may result in the following error:
 ```
Device has 2 GPUs available. Provide device={deviceId} to
`from_model_id` to use available GPUs for execution. deviceId is -1
(default) for CPU and can be a positive integer associated with CUDA
device id.
  Traceback (most recent call last):
    File "foo.py", line 15, in <module>
      llm = HuggingFacePipeline.from_model_id(
File
"foo\site-packages\langchain_huggingface\llms\huggingface_pipeline.py",
line 217, in from_model_id
      pipeline = hf_pipeline(
File "foo\lib\site-packages\transformers\pipelines\__init__.py", line
1108, in pipeline
return pipeline_class(model=model, framework=framework, task=task,
**kwargs)
File "foo\lib\site-packages\transformers\pipelines\text_generation.py",
line 96, in __init__
      super().__init__(*args, **kwargs)
File "foo\lib\site-packages\transformers\pipelines\base.py", line 835,
in __init__
      raise ValueError(
ValueError: The model has been loaded with `accelerate` and therefore
cannot be moved to a specific device. Please discard the `device`
argument when creating your pipeline object.
```
This error occurs because, in from_model_id, the default values in from_model_id for device and device_map are -1 and None, respectively. It would passes the statement (`device_map is not None and device < 0`) and keep the device as -1 so the pipeline function later raises an error when trying to move a GPU-loaded model back to the CPU. 
19eb82e68b/libs/community/langchain_community/llms/huggingface_pipeline.py (L204-L213)




If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: William FH <13333726+hinthornw@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: vbarda <vadym@langchain.dev>
2024-10-22 21:41:47 -04:00
Fernando de Oliveira
ab205e7389
partners/openai + community: Async Azure AD token provider support for Azure OpenAI (#27488)
This PR introduces a new `azure_ad_async_token_provider` attribute to
the `AzureOpenAI` and `AzureChatOpenAI` classes in `partners/openai` and
`community` packages, given it's currently supported on `openai` package
as
[AsyncAzureADTokenProvider](https://github.com/openai/openai-python/blob/main/src/openai/lib/azure.py#L33)
type.

The reason for creating a new attribute is to avoid breaking changes.
Let's say you have an existing code that uses a `AzureOpenAI` or
`AzureChatOpenAI` instance to perform both sync and async operations.
The `azure_ad_token_provider` will work exactly as it is today, while
`azure_ad_async_token_provider` will override it for async requests.


If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2024-10-22 21:43:06 +00:00
orkhank
9a277cbe00
community: Update file_path type in JSONLoader.__init__() signature (#27535)
- **Description:** Change the type of the `file_path` argument from `str
| pathlib.Path` to `str | os.PathLike`, since the latter is more widely
used: https://stackoverflow.com/a/58541858
  
This is a very minor fix. I was just annoyed to see the red underline
displayed by Pylance in VS Code: `reportArgumentType`.

![image](https://github.com/user-attachments/assets/719a7f8e-acca-4dfa-89df-925e1d938c71)
  
  The changes do not affect the behavior of the code.
2024-10-22 11:18:36 -07:00
Eric Pinzur
f636c83321
community: Cassandra Vector Store: modernize implementation (#27253)
**Description:** 

This PR updates `CassandraGraphVectorStore` to be based off
`CassandraVectorStore`, instead of using a custom CQL implementation.
This allows users using a `CassandraVectorStore` to upgrade to a
`GraphVectorStore` without having to change their database schema or
re-embed documents.

This PR also updates the documentation of the `GraphVectorStore` base
class and contains native async implementations for the standard graph
methods: `traversal_search` and `mmr_traversal_search` in
`CassandraVectorStore`.

**Issue:** No issue number.

**Dependencies:** https://github.com/langchain-ai/langchain/pull/27078
(already-merged)

**Lint and test**: 
- Lint and tests all pass, including existing
`CassandraGraphVectorStore` tests.
- Also added numerous additional tests based of the tests in
`langchain-astradb` which cover many more scenarios than the existing
tests for `Cassandra` and `CassandraGraphVectorStore`

** BREAKING CHANGE**

Note that this is a breaking change for existing users of
`CassandraGraphVectorStore`. They will need to wipe their database table
and restart.

However:
- The interfaces have not changed. Just the underlying storage
mechanism.
- Any one using `langchain_community.vectorstores.Cassandra` can instead
use `langchain_community.graph_vectorstores.CassandraGraphVectorStore`
and they will gain Graph capabilities without having to re-embed their
existing documents. This is the primary goal of this PR.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-22 18:11:11 +00:00
Vadym Barda
0640cbf2f1
huggingface[patch]: hide client field in HuggingFaceEmbeddings (#27522) 2024-10-21 17:37:07 -04:00
Chun Kang Lu
380449a7a9
core: fix Image prompt template hardcoded template format (#27495)
Fixes #27411 

**Description:** Adds `template_format` to the `ImagePromptTemplate`
class and updates passing in the `template_format` parameter from
ChatPromptTemplate instead of the hardcoded "f-string".
Also updated docs and typing related to `template_format` to be more
up-to-date and specific.

**Dependencies:** None

**Add tests and docs**: Added unit tests to validate fix. Needed to
update `test_chat` snapshot due to adding new attribute
`template_format` in `ImagePromptTemplate`.

---------

Co-authored-by: Vadym Barda <vadym@langchain.dev>
2024-10-21 17:31:40 -04:00
bbaltagi-dtsl
403c0ea801
community: fix DallE hidden open_api_key (#26996)
Thank you for contributing to LangChain!

- [ X] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "templates:
..." for template changes, "infra: ..." for CI changes.
  - Example: "community: add foobar LLM"


- [ X] 
    - **Issue:** issue #26941


Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-21 19:46:56 +00:00
nodfans
cfcf783cb5
community: fix a typo in planner_prompt.py (#27489)
Description: Fix typo in planner_prompt.py.
2024-10-21 14:59:33 +00:00
Erick Friis
97a819d578
community: fix lint from new mypy (#27474) 2024-10-18 20:08:03 +00:00
Erick Friis
c397baa85f
community: release 0.3.3 (#27472) 2024-10-18 12:52:15 -07:00
Erick Friis
4ceb28009a
mongodb: migrate to repo (#27467) 2024-10-18 12:35:12 -07:00
Erick Friis
a562c54f7d
azure-dynamic-sessions: migrate to repo (#27468) 2024-10-18 12:30:48 -07:00
Erick Friis
30660786b3
langchain: release 0.3.4 (#27458) 2024-10-18 11:59:54 -07:00
Erick Friis
2cf2cefe39
partners/openai: release 0.2.3 (#27457) 2024-10-18 08:16:01 -07:00
Erick Friis
7d65a32ee0
openai: audio modality, remove sockets from unit tests (#27436) 2024-10-18 08:02:09 -07:00
Erick Friis
f9cc9bdcf3
core: release 0.3.12 (#27410) 2024-10-17 06:32:40 -07:00
Erick Friis
0ebddabf7d
docs, core: error messaging [wip] (#27397) 2024-10-17 03:39:36 +00:00
Eugene Yurtsev
202d7f6c4a
core[patch]: 0.3.11 release (#27403)
Core bump to 0.3.11
2024-10-16 15:39:37 -04:00
Bagatur
a4392b070d core[patch]: add convert_to_openai_messages util (#27263)
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-16 17:10:10 +00:00
sByteman
31e7664afd
community[minor]: add proxy support to RecursiveUrlLoader (#27364)
**Description**
This PR introduces the proxies parameter to the RecursiveUrlLoader
class, allowing the user to specify proxy servers for requests. This
update enables crawling through proxy servers, providing enhanced
flexibility for network configurations.
The key changes include:
  1.Added an optional proxies parameter to the constructor (__init__).
2.Updated the documentation to explain the proxies parameter usage with
an example.
3.Modified the _get_child_links_recursive method to pass the proxies
parameter to the requests.get function.



**Sample Usage**

```python
from bs4 import BeautifulSoup as Soup
from langchain_community.document_loaders.recursive_url_loader import RecursiveUrlLoader

proxies = {
    "http": "http://localhost:1080",
    "https": "http://localhost:1080",
}
url = "https://python.langchain.com/docs/concepts/#langchain-expression-language-lcel"
loader = RecursiveUrlLoader(
    url=url, max_depth=1, extractor=lambda x: Soup(x, "html.parser").text,proxies=proxies
)
docs = loader.load()
```

---------

Co-authored-by: root <root@thb>
2024-10-16 16:29:59 +00:00
Yuki Watanabe
b8bfebd382
community: Add deprecation notice for Databricks integration in langchain-community (#27355)
We have released the
[langchain-databricks](https://github.com/langchain-ai/langchain-databricks)
package for Databricks integration. This PR deprecates the legacy
classes within `langchain-community`.

---------

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-16 02:20:40 +00:00
xsai9101
15c1ddaf99
community: Add support for clob datatype in oracle database (#27330)
**Description**:
This PR add support of clob/blob data type for oracle document loader,
clob/blob can only be read by oracledb package when connection is open,
so reformat code to process data before connection closes.

**Dependencies**:
oracledb package same as before. pip install oracledb

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-16 02:19:20 +00:00
Enes Bol
3f74dfc3d8
community[patch]: Fix vLLM integration to filter SamplingParams (#27367)
**Description:**
- This pull request addresses a bug in Langchain's VLLM integration,
where the use_beam_search parameter was erroneously passed to
SamplingParams. The SamplingParams class in vLLM does not support the
use_beam_search argument, which caused a TypeError.

- This PR introduces logic to filter out unsupported parameters,
ensuring that only valid parameters are passed to SamplingParams. As a
result, the integration now functions as expected without errors.

- The bug was reproduced by running the code sample from Langchain’s
documentation, which triggered the error due to the invalid parameter.
This fix resolves that error by implementing proper parameter filtering.

**VLLM Sampling Params Class:**
https://github.com/vllm-project/vllm/blob/main/vllm/sampling_params.py

**Issue:**
I could not found an Issue that belongs to this. Fixes "TypeError:
Unexpected keyword argument 'use_beam_search'" error when using VLLM
from Langchain.

**Dependencies:**
None.

**Tests and Documentation**:
Tests:
No new functionality was added, but I tested the changes by running
multiple prompts through the VLLM integration with various parameter
configurations. All tests passed successfully without breaking
compatibility.

Docs
No documentation changes were necessary as this is a bug fix.

**Reproducing the Error:**

https://python.langchain.com/docs/integrations/llms/vllm/

The code sample from the original documentation can be used to reproduce
the error I got.

from langchain_community.llms import VLLM
llm = VLLM(
    model="mosaicml/mpt-7b",
    trust_remote_code=True,  # mandatory for hf models
    max_new_tokens=128,
    top_k=10,
    top_p=0.95,
    temperature=0.8,
)
print(llm.invoke("What is the capital of France ?"))

![image](https://github.com/user-attachments/assets/3782d6ac-1f7b-4acc-bf2c-186216149de5)


This PR resolves the issue by ensuring that only valid parameters are
passed to SamplingParams.
2024-10-15 21:57:50 +00:00
Erick Friis
edf6d0a0fb
partners/couchbase: release 0.2.0 (attempt 2) (#27375) 2024-10-15 14:51:05 -07:00
Jorge Piedrahita Ortiz
12fea5b868
community: sambastudio chat model integration minor fix (#27238)
**Description:** sambastudio chat model integration minor fix
 fix default params
 fix usage metadata when streaming
2024-10-15 13:24:36 -04:00
ZhangShenao
f3925d71b9
community: Fix word spelling in Text2vecEmbeddings (#27183)
Fix word spelling in `Text2vecEmbeddings`
2024-10-15 09:28:48 -07:00
Erick Friis
92ae61bcc8
multiple: rely on asyncio_mode auto in tests (#27200) 2024-10-15 16:26:38 +00:00
William FH
0a3e089827
[Anthropic] Shallow Copy (#27105)
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-15 15:50:48 +00:00
Matthew Peveler
c6533616b6
docs: fix community pgvector deprecation warning formatting (#27094)
**Description**: PR fixes some formatting errors in deprecation message
in the `langchain_community.vectorstores.pgvector` module, where it was
missing spaces between a few words, and one word was misspelled.
**Issue**: n/a
**Dependencies**: n/a

Signed-off-by: mpeveler@timescale.com
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-15 15:45:53 +00:00
Erick Friis
3fa5ce3e5f
community: clear mypy syntax warning in openapi (#27370)
not completely clear the regex is functional
2024-10-15 15:43:53 +00:00
Ahmet Yasin Aytar
443b37403d
community: refactor Arxiv search logic (#27084)
PR message:

Description:
This PR refactors the Arxiv API wrapper by extracting the Arxiv search
logic into a helper function (_fetch_results) to reduce code duplication
and improve maintainability. The helper function is used in methods like
get_summaries_as_docs, run, and lazy_load, streamlining the code and
making it easier to maintain in the future.

Issue:
This is a minor refactor, so no specific issue is being fixed.

Dependencies:
No new dependencies are introduced with this change.

Add tests and docs:
No new integrations were added, so no additional tests or docs are
necessary for this PR.
Lint and test:
I have run make format, make lint, and make test to ensure all checks
pass successfully.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-15 08:43:03 -07:00
Qiu Qin
57fbc6bdf1
community: Update OCI data science integration (#27083)
This PR updates the integration with OCI data science model deployment
service.

- Update LLM to support streaming and async calls.
- Added chat model.
- Updated tests and docs.
- Updated `libs/community/scripts/check_pydantic.sh` since the use of
`@pre_init` is removed from existing integration.
- Updated `libs/community/extended_testing_deps.txt` as this integration
requires `langchain_openai`.

---------

Co-authored-by: MING KANG <ming.kang@oracle.com>
Co-authored-by: Dmitrii Cherkasov <dmitrii.cherkasov@oracle.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-15 08:32:54 -07:00
Rafael Miller
fc14f675f1
Community: Updated Firecrawl Document Loader to v1 (#26548)
This PR updates the Firecrawl Document Loader to use the recently
released V1 API of Firecrawl.

**Key Updates:**

**Firecrawl V1 Integration:** Updated the document loader to leverage
the new Firecrawl V1 API for improved performance, reliability, and
developer experience.

**Map Functionality Added:** Introduced the map mode for more flexible
document loading options.

These updates enhance the integration and provide access to the latest
features of Firecrawl.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-10-15 13:13:28 +00:00
Max Tran
8fea07f92e
community: fixed KeyError: 'client' (#27345)
Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "templates:
..." for template changes, "infra: ..." for CI changes.
  - Example: "community: add foobar LLM"
Updated

- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
twitter: @MaxHTran

- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
Not needed due to small change

- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Max Tran <maxtra@amazon.com>
2024-10-14 20:51:13 +00:00
Martin Triska
8dc4bec947
[community] [Bugfix] base_o365 document loader metadata needs to be JSON serializable (#26322)
In order for indexer to work, all metadata in the documents need to be
JSON serializable. Timestamps are not.

See here:

https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/indexing/api.py#L83-L89

@eyurtsev could you please review? It's a tiny PR :-)
2024-10-14 12:48:31 -04:00
Trayan Azarov
59bbda9ba3
chroma: Deprecating versions 0.5.7 thru 0.5.12 (#27305)
**Description:** Deprecated version of Chroma >=0.5.5 <0.5.12 due to a
serious correctness issue that caused some embeddings for deployments
with multiple collections to be lost (read more on the issue in Chroma
repo)
**Issue:** chroma-core/chroma#2922 (fixed by chroma-core/chroma##2923
and released in
[0.5.13](https://github.com/chroma-core/chroma/releases/tag/0.5.13))
**Dependencies:** N/A
**Twitter handle:** `@t_azarov`
2024-10-14 11:56:05 -04:00
Marcelo Nunes Alves
5647276998
community: Problem with embeddings in new versions of clickhouse. (#26041)
Starting with Clickhouse version 24.8, a different type of configuration
has been introduced in the vectorized data ingestion, and if this
configuration occurs, an error occurs when generating the table. As can
be seen below:

![Screenshot from 2024-09-04
11-48-00](https://github.com/user-attachments/assets/70840a93-1001-490c-921a-26924c51d9eb)

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-11 18:54:50 +00:00
Eugene Yurtsev
5b9b8fe80f
core[patch]: Ignore ASYNC110 to upgrade to newest ruff version (#27229)
Ignoring ASYNC110 with explanation
2024-10-09 11:25:58 -04:00
Vittorio Rigamonti
7da2efd9d3
community[minor]: VectorStore Infinispan. Adding TLS and authentication (#23522)
**Description**:
this PR enable VectorStore TLS and authentication (digest, basic) with
HTTP/2 for Infinispan server.
Based on httpx.

Added docker-compose facilities for testing
Added documentation

**Dependencies:**
requires `pip install httpx[http2]` if HTTP2 is needed

**Twitter handle:**
https://twitter.com/infinispan
2024-10-09 10:51:39 -04:00
Diao Zihao
4553573acb
core[patch],langchain[patch],community[patch]: Bump version dependency of tenacity to >=8.1.0,!=8.4.0,<10 (#27201)
This should fixes the compatibility issue with graprag as in

- https://github.com/langchain-ai/langchain/discussions/25595

Here are the release notes for tenacity 9
(https://github.com/jd/tenacity/releases/tag/9.0.0)

---------

Signed-off-by: Zihao Diao <hi@ericdiao.com>
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-10-09 14:00:45 +00:00
Stefano Lottini
d05fdd97dd
community: Cassandra Vector Store: extend metadata-related methods (#27078)
**Description:** this PR adds a set of methods to deal with metadata
associated to the vector store entries. These, while essential to the
Graph-related extension of the `Cassandra` vector store, are also useful
in themselves. These are (all come in their sync+async versions):

- `[a]delete_by_metadata_filter`
- `[a]replace_metadata`
- `[a]get_by_document_id`
- `[a]metadata_search`

Additionally, a `[a]similarity_search_with_embedding_id_by_vector`
method is introduced to better serve the store's internal working (esp.
related to reranking logic).

**Issue:** no issue number, but now all Document's returned bear their
`.id` consistently (as a consequence of a slight refactoring in how the
raw entries read from DB are made back into `Document` instances).

**Dependencies:** (no new deps: packaging comes through langchain-core
already; `cassio` is now required to be version 0.1.10+)


**Add tests and docs**
Added integration tests for the relevant newly-introduced methods.
(Docs will be updated in a separate PR).

**Lint and test** Lint and (updated) test all pass.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-09 06:41:34 +00:00
Erick Friis
84c05b031d
community: release 0.3.2 (#27214) 2024-10-08 23:33:55 -07:00
Serena Ruan
a7c1ce2b3f
[community] Add timeout control and retry for UC tool execution (#26645)
Add timeout at client side for UCFunctionToolkit and add retry logic.
Users could specify environment variable
`UC_TOOL_CLIENT_EXECUTION_TIMEOUT` to increase the timeout value for
retrying to get the execution response if the status is pending. Default
timeout value is 120s.


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

Tested in Databricks:
<img width="1200" alt="image"
src="https://github.com/user-attachments/assets/54ab5dfc-5e57-4941-b7d9-bfe3f8ad3f62">



- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Signed-off-by: serena-ruan <serena.rxy@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-09 06:31:48 +00:00
Tomaz Bratanic
481bd25d29
community: Fix database connections for neo4j (#27190)
Fixes https://github.com/langchain-ai/langchain/issues/27185

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-08 23:47:55 +00:00
Erick Friis
cedf4d9462
langchain: release 0.3.3 (#27213) 2024-10-08 16:39:42 -07:00
Erick Friis
7264fb254c
core: release 0.3.10 (#27209) 2024-10-08 16:21:42 -07:00
Bagatur
ce33c4fa40
openai[patch]: default temp=1 for o1 (#27206) 2024-10-08 15:45:21 -07:00
RIdham Golakiya
73ad7f2e7a
langchain_chroma[patch]: updated example for get documents with where clause (#26767)
Example updated for vectorstore ChromaDB.

If we want to apply multiple filters then ChromaDB supports filters like
this:
Reference: [ChromaDB
filters](https://cookbook.chromadb.dev/core/filters/)

Thank you.
2024-10-08 20:21:58 +00:00
Bagatur
e3e9ee8398
core[patch]: utils for adding/subtracting usage metadata (#27203) 2024-10-08 13:15:33 -07:00
ccurme
e3920f2320
community[patch]: fix structured_output in llamacpp integration (#27202)
Resolves https://github.com/langchain-ai/langchain/issues/25318.
2024-10-08 15:16:59 -04:00
Erick Friis
b84e00283f
standard-tests: test that only one chunk sets input_tokens (#27177) 2024-10-08 11:35:32 -07:00
Ajayeswar Reddy
9b7bdf1a26
Fixed typo in llibs/community/langchain_community/storage/sql.py (#27029)
- [ ] **PR title**: docs: fix typo in SQLStore import path

- [ ] **PR message**: 
- **Description:** This PR corrects a typo in the docstrings for the
class SQLStore(BaseStore[str, bytes]). The import path in the docstring
currently reads from langchain_rag.storage import SQLStore, which should
be changed to langchain_community.storage import SQLStore. This typo is
also reflected in the official documentation.
    - **Issue:** N/A
    - **Dependencies:** None
    - **Twitter handle:** N/A

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-08 17:51:26 +00:00
Vadym Barda
8d27325dbc
core[patch]: support ValidationError from pydantic v1 in tools (#27194) 2024-10-08 10:19:04 -04:00
Christophe Bornet
16f5fdb38b
core: Add various ruff rules (#26836)
Adds
- ASYNC
- COM
- DJ
- EXE
- FLY
- FURB
- ICN
- INT
- LOG
- NPY
- PD
- Q
- RSE
- SLOT
- T10
- TID
- YTT

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-07 22:30:27 +00:00
Erick Friis
5c826faece
core: update make format to fix all autofixable things (#27174) 2024-10-07 15:20:47 -07:00
Christophe Bornet
d31ec8810a
core: Add ruff rules for error messages (EM) (#26965)
All auto-fixes

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-07 22:12:28 +00:00
Oleksii Pokotylo
37ca468d03
community: AzureSearch: fix reranking for empty lists (#27104)
**Description:** 
  Fix reranking for empty lists 

**Issue:** 
```
ValueError: not enough values to unpack (expected 3, got 0)
    documents, scores, vectors = map(list, zip(*docs))
  File langchain_community/vectorstores/azuresearch.py", line 1680, in _reorder_results_with_maximal_marginal_relevance
```

Co-authored-by: Oleksii Pokotylo <oleksii.pokotylo@pwc.com>
2024-10-07 15:27:09 -04:00
Christophe Bornet
c4ebccfec2
core[minor]: Improve support for id in VectorStore (#26660)
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-10-07 15:01:08 -04:00
Bharat Ramanathan
931ce8d026
core[patch]: Update AsyncCallbackManager to honor run_inline attribute and prevent context loss (#26885)
## Description

This PR fixes the context loss issue in `AsyncCallbackManager`,
specifically in `on_llm_start` and `on_chat_model_start` methods. It
properly honors the `run_inline` attribute of callback handlers,
preventing race conditions and ordering issues.

Key changes:
1. Separate handlers into inline and non-inline groups.
2. Execute inline handlers sequentially for each prompt.
3. Execute non-inline handlers concurrently across all prompts.
4. Preserve context for stateful handlers.
5. Maintain performance benefits for non-inline handlers.

**These changes are implemented in `AsyncCallbackManager` rather than
`ahandle_event` because the issue occurs at the prompt and message_list
levels, not within individual events.**

## Testing

- Test case implemented in #26857 now passes, verifying execution order
for inline handlers.

## Related Issues

- Fixes issue discussed in #23909 

## Dependencies

No new dependencies are required.

---

@eyurtsev: This PR implements the discussed changes to respect
`run_inline` in `AsyncCallbackManager`. Please review and advise on any
needed changes.

Twitter handle: @parambharat

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-10-07 14:59:29 -04:00
João Carlos Ferra de Almeida
780ce00dea
core[minor]: add **kwargs to index and aindex functions for custom vector_field support (#26998)
Added `**kwargs` parameters to the `index` and `aindex` functions in
`libs/core/langchain_core/indexing/api.py`. This allows users to pass
additional arguments to the `add_documents` and `aadd_documents`
methods, enabling the specification of a custom `vector_field`. For
example, users can now use `vector_field="embedding"` when indexing
documents in `OpenSearchVectorStore`

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-10-07 14:52:50 -04:00
Jorge Piedrahita Ortiz
14de81b140
community: sambastudio chat model (#27056)
**Description:**: sambastudio chat model integration added, previously
only LLM integration
     included docs and tests

---------

Co-authored-by: luisfucros <luisfucros@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-10-07 14:31:39 -04:00
Aditya Anand
f70650f67d
core[patch]: correct typo doc-string for astream_events method (#27108)
This commit addresses a typographical error in the documentation for the
async astream_events method. The word 'evens' was incorrectly used in
the introductory sentence for the reference table, which could lead to
confusion for users.\n\n### Changes Made:\n- Corrected 'Below is a table
that illustrates some evens that might be emitted by various chains.' to
'Below is a table that illustrates some events that might be emitted by
various chains.'\n\nThis enhancement improves the clarity of the
documentation and ensures accurate terminology is used throughout the
reference material.\n\nIssue Reference: #27107
2024-10-07 14:12:42 -04:00
Bagatur
38099800cc
docs: fix anthropic max_tokens docstring (#27166) 2024-10-07 16:51:42 +00:00
ogawa
07dd8dd3d7
community[patch]: update gpt-4o cost (#27038)
updated OpenAI cost definition according to the following:
https://openai.com/api/pricing/
2024-10-07 09:06:30 -04:00
Bagatur
06ce5d1d5c
anthropic[patch]: Release 0.2.3 (#27126) 2024-10-04 22:38:03 +00:00
Bagatur
0b8416bd2e
anthropic[patch]: fix input_tokens when cached (#27125) 2024-10-04 22:35:51 +00:00
Bagatur
bd5b335cb4
standard-tests[patch]: fix oai usage metadata test (#27122) 2024-10-04 20:00:48 +00:00
Bagatur
827bdf4f51
fireworks[patch]: Release 0.2.1 (#27120) 2024-10-04 18:59:15 +00:00
Bagatur
98942edcc9
openai[patch]: Release 0.2.2 (#27119) 2024-10-04 11:54:01 -07:00
Bagatur
414fe16071
anthropic[patch]: Release 0.2.2 (#27118) 2024-10-04 11:53:53 -07:00
Bagatur
11df1b2b8d
core[patch]: Release 0.3.9 (#27117) 2024-10-04 18:35:33 +00:00
Scott Hurrey
558fb4d66d
box: Add citation support to langchain_box.retrievers.BoxRetriever when used with Box AI (#27012)
Thank you for contributing to LangChain!

**Description:** Box AI can return responses, but it can also be
configured to return citations. This change allows the developer to
decide if they want the answer, the citations, or both. Regardless of
the combination, this is returned as a single List[Document] object.

**Dependencies:** Updated to the latest Box Python SDK, v1.5.1
**Twitter handle:** BoxPlatform


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-04 18:32:34 +00:00
Bagatur
1e768a9ec7
anthropic[patch]: correctly handle tool msg with empty list (#27109) 2024-10-04 11:30:50 -07:00
Bagatur
4935a14314
core,integrations[minor]: Dont error on fields in model_kwargs (#27110)
Given the current erroring behavior, every time we've moved a kwarg from
model_kwargs and made it its own field that was a breaking change.
Updating this behavior to support the old instantiations /
serializations.

Assuming build_extra_kwargs was not something that itself is being used
externally and needs to be kept backwards compatible
2024-10-04 11:30:27 -07:00
Bagatur
0495b7f441
anthropic[patch]: add usage_metadata details (#27087)
fixes https://github.com/langchain-ai/langchain/pull/27087
2024-10-04 08:46:49 -07:00
Erick Friis
e8e5d67a8d
openai: fix None token detail (#27091)
happens in Azure
2024-10-04 01:25:38 +00:00
Erick Friis
ab4dab9a0c
core: fix batch race condition in FakeListChatModel (#26924)
fixed #26273
2024-10-03 23:14:31 +00:00
Bagatur
87fc5ce688
core[patch]: exclude model cache from ser (#27086) 2024-10-03 22:00:31 +00:00
Bagatur
c09da53978
openai[patch]: add usage metadata details (#27080) 2024-10-03 14:01:03 -07:00
Bagatur
546dc44da5
core[patch]: add UsageMetadata details (#27072) 2024-10-03 20:36:17 +00:00
Tibor Reiss
47a9199fa6
community[patch]: Fix missing protected_namespaces (#27076)
Fixes #26861

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-10-03 20:12:11 +00:00
Bagatur
2a54448a0a
langchain[patch]: Release 0.3.2 (#27073) 2024-10-03 18:13:23 +00:00
Bharat Ramanathan
103e573f9b
community[patch]: chore warn deprecate the wandb callback handler (#27062)
- **Description:**: This PR deprecates the wandb callback handler in
favor of the new
[WeaveTracer](https://weave-docs.wandb.ai/guides/integrations/langchain#using-weavetracer)
in W&B
- **Dependencies:** No dependencies, just a deprecation warning.
- **Twitter handle:** @parambharat


@baskaryan
2024-10-03 11:59:20 -04:00
Eugene Yurtsev
635c55c039
core[patch]: Release 0.3.8 (#27046)
0.3.8 release for core
2024-10-02 16:58:38 +00:00
Eugene Yurtsev
74bf620e97
core[patch]: Support injected tool args that are arbitrary types (#27045)
This adds support for inject tool args that are arbitrary types when
used with pydantic 2.

We'll need to add similar logic on the v1 path, and potentially mirror
the config from the original model when we're doing the subset.
2024-10-02 12:50:58 -04:00
Bagatur
099235da01
Revert "huggingface[patch]: make HuggingFaceEndpoint serializable (#2… (#27032)
…7027)"

This reverts commit b5e28d3a6d.
2024-10-01 21:26:38 +00:00
Bagatur
5f2e93ffea
huggingface[patch]: xfail test (#27031) 2024-10-01 21:14:07 +00:00
Bagatur
b5e28d3a6d
huggingface[patch]: make HuggingFaceEndpoint serializable (#27027) 2024-10-01 13:16:10 -07:00
ccurme
9d10151123
core[patch]: fix init of RunnableAssign (#26903)
Example in API ref currently raises ValidationError.

Resolves https://github.com/langchain-ai/langchain/issues/26862
2024-10-01 14:21:54 -04:00
Erick Friis
95a87291fd
community: deprecate community ollama integrations (#26733) 2024-10-01 09:18:07 -07:00
ZhangShenao
e317d457cf
Bug-Fix[Community] Fix FastEmbedEmbeddings (#26764)
#26759 

- Fix https://github.com/langchain-ai/langchain/issues/26759 
- Change `model` param from private to public, which may not be
initiated.
- Add test case
2024-09-30 21:23:08 -04:00
Erick Friis
a8e1577f85
milvus: mv to external repo (#26920) 2024-10-01 00:38:30 +00:00
Erick Friis
35f6393144
unstructured: mv to external repo (#26923) 2024-09-30 17:38:21 -07:00
Erick Friis
7ecd720120
multiple: update docs urls to latest 2 (#26837) 2024-09-30 17:37:07 -07:00
Tomaz Bratanic
446144e7c6
Update neo4j vector procedures (#26775) 2024-09-30 14:45:09 -07:00
federico-pisanu
2538963945
core[patch]: improve index/aindex api when batch_size<n_docs (#25754)
- **Description:** prevent index function to re-index entire source
document even if nothing has changed.
- **Issue:** #22135

I worked on a solution to this issue that is a compromise between being
cheap and being fast.
In the previous code, when batch_size is greater than the number of docs
from a certain source almost the entire source is deleted (all documents
from that source except for the documents in the first batch)
My solution deletes documents from vector store and record manager only
if at least one document has changed for that source.

Hope this can help!

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-09-30 20:57:41 +00:00
Eugene Yurtsev
7fde2791dc
core[patch]: Add kwargs to Runnable (#27008)
Fixes #26685

---------

Co-authored-by: Tibor Reiss <tibor.reiss@gmail.com>
2024-09-30 16:45:29 -04:00
Christophe Bornet
2a6abd3f0a
community[patch]: Add docstring for Links (#25969)
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-09-30 20:33:50 +00:00
Mohammad Mohtashim
e12f570ead
Merge pull request #26794
* [chore]: Agent Observation should be casted to string to avoid errors

* Merge branch 'master' into fix_observation_type_streaming

* [chore]: Using Json.dumps

* [chore]: Exact same logic as  when casting agent oobservation to string
2024-09-30 15:54:51 -04:00
Bagatur
34bd718fe1
core[patch]: Release 0.3.7 (#27004) 2024-09-30 18:52:42 +00:00
Bagatur
248be02259
core[patch]: fix structured prompt template format (#27003)
template_format is an init argument on ChatPromptTemplate but not an
attribute on the object so was getting shoved into
StructuredPrompt.structured_ouptut_kwargs
2024-09-30 11:47:46 -07:00
Bagatur
0078493a80
fireworks[patch]: allow tool_choice with multiple tools (#26999)
https://docs.fireworks.ai/api-reference/post-chatcompletions
2024-09-30 11:28:43 -07:00
Bagatur
c7120d87dd
groq[patch]: support tool_choice=any/required (#27000)
https://console.groq.com/docs/api-reference#chat-create
2024-09-30 11:28:35 -07:00
Christophe Bornet
db8845a62a
core: Add ruff rules for pycodestyle Warning (W) (#26964)
All auto-fixes.
2024-09-30 09:31:43 -04:00
Bagatur
9404e7af9d
openai[patch]: exclude http client (#26891)
httpx clients aren't serializable
2024-09-29 11:16:27 -07:00
Ben Chambers
29bf89db25
community: Add conversions from GVS to networkx (#26906)
These allow converting linked documents (such as those used with
GraphVectorStore) to networkx for rendering and/or in-memory graph
algorithms such as community detection.
2024-09-27 16:48:55 -04:00
Christophe Bornet
7809b31b95
core[patch]: Add ruff rules for flake8-simplify (SIM) (#26848)
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-09-27 20:13:23 +00:00
Eugene Yurtsev
de0b48c41a
docs: Upgrade examples with RunnableWithMessageHistory to langgraph memory (#26855)
This PR updates the documentation examples that used
RunnableWithMessageHistory to show how to achieve the same
implementation with langgraph memory.

Some of the underlying PRs (not all of them):

- docs[patch]: update chatbot tutorial and migration guide (#26780)
- docs[patch]: update chatbot memory how-to (#26790)
- docs[patch]: update chatbot tools how-to (#26816)
- docs: update chat history in rag how-to (#26821)
- docs: update trim messages notebook (#26793)
- docs: clean up imports in how to guide for rag qa with chat history
(#26825)
- docs[patch]: update conversational rag tutorial (#26814)

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
Co-authored-by: Vadym Barda <vadym@langchain.dev>
Co-authored-by: mercyspirit <ziying.qiu@gmail.com>
Co-authored-by: aqiu7 <aqiu7@gatech.edu>
Co-authored-by: John <43506685+Coniferish@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: William FH <13333726+hinthornw@users.noreply.github.com>
Co-authored-by: Subhrajyoty Roy <subhrajyotyroy@gmail.com>
Co-authored-by: Rajendra Kadam <raj.725@outlook.com>
Co-authored-by: Christophe Bornet <cbornet@hotmail.com>
Co-authored-by: Devin Gaffney <itsme@devingaffney.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-09-27 20:04:30 +00:00
Christophe Bornet
f4e738bb40
core: Add ruff rules for PIE (#26939)
All auto-fixes.
2024-09-27 12:08:35 -04:00
ccurme
39987ebd91
openai[patch]: update deprecation target in API ref (#26921) 2024-09-27 08:42:31 -04:00
Subhrajyoty Roy
7f37fd8b80
community[patch]: callback before yield for cloudflare (#26927)
**Description:** Moves yield to after callback for `_stream` function
for the cloudfare workersai model in the community llm package
**Issue:** #16913
2024-09-27 08:42:01 -04:00
Youshin Kim
2d9a09dfa4
Fix typo in mlflow code example in mlflow.py (#26931)
- [x] PR title: Fix typo in code example in mlflow.py
- In libs/community/langchain_community/chat_models/mlflow.py
2024-09-27 12:41:39 +00:00
Subhrajyoty Roy
7037ba0f06
community[patch]: callback before yield for mlx pipeline (#26928)
**Description:** Moves yield to after callback for `_stream` function
for the MLX pipeline model in the community llm package
**Issue:** #16913
2024-09-27 08:41:34 -04:00
Subhrajyoty Roy
adcfecdb67
community[patch]: callback before yield for textgen (#26929)
**Description:** Moves callback to before yield for `_stream` and
`_astream` function for the textgen model in the community llm package
**Issue:** #16913
2024-09-27 08:41:13 -04:00
Subhrajyoty Roy
5f2cc4ecb2
community[patch]: callback before yield for titan takeoff (#26930)
**Description:** Moves yield to after callback for `_stream` function
for the titan takeoff model in the community llm package
**Issue:** #16913
2024-09-27 08:40:22 -04:00
Emmanuel Sciara
c6350d636e
core[fix]: using async rate limiter methods in async code (#26914)
**Description:** Replaced blocking (sync) rate_limiter code in async
methods.

**Issue:** #26913

**Dependencies:** N/A

**Twitter handle:** no need 🤗
2024-09-26 20:44:28 +00:00
Abhi Agarwal
696114e145
community: add sqlite-vec vectorstore (#25003)
**Description**:

Adds a vector store integration with
[sqlite-vec](https://alexgarcia.xyz/sqlite-vec/), the successor to
sqlite-vss that is a single C file with no external dependencies.

Pretty straightforward, just copy-pasted the sqlite-vss integration and
made a few tweaks and added integration tests. Only question is whether
all documentation should be directed away from sqlite-vss if it is
defacto deprecated (cc @asg017).

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: philippe-oger <philippe.oger@adevinta.com>
2024-09-26 17:37:10 +00:00
Erick Friis
8bc12df2eb
voyageai: new models (#26907)
Co-authored-by: fzowl <zoltan@voyageai.com>
Co-authored-by: fzowl <160063452+fzowl@users.noreply.github.com>
2024-09-26 17:07:10 +00:00
Subhrajyoty Roy
ba467f1a36
community[patch]: callback before yield for gigachat (#26881)
**Description:** Moves yield to after callback for `_stream` and
`_astream` function for the gigachat model in the community llm package
**Issue:** #16913
2024-09-26 12:47:28 -04:00
Subhrajyoty Roy
11e703a97e
community[patch]: callback before yield for google palm (#26882)
**Description:** Moves yield to after callback for `_stream` function
for the google palm model in the community package
**Issue:** #16913
2024-09-26 12:47:05 -04:00
Julius Stopforth
121e79b1f0
core: Fix IndexError when trim_messages invoked with empty list (#26896)
This prevents `trim_messages` from raising an `IndexError` when invoked
with `include_system=True`, `strategy="last"`, and an empty message
list.

Fixes #26895

Dependencies: none
2024-09-26 11:29:58 -04:00
ccurme
7091a1a798
openai[patch]: increase token limit in azure integration tests (#26901)
`test_json_mode` occasionally runs into this
2024-09-26 14:31:33 +00:00
Erick Friis
2ea5f60cc5
experimental: migrate to external repo (#26879)
security scanners can't distinguish monorepo sources from each other.
this will resolve issues for folks trying to use e.g. langchain-core but
getting security issues from experimental flagged!
2024-09-25 19:02:19 -07:00
Erick Friis
6f3c8313ba
community: bump langchain version (#26876) 2024-09-25 12:58:24 -07:00
Erick Friis
e068407f18
community: bump core versoin (#26875) 2024-09-25 12:57:16 -07:00
Eugene Yurtsev
25cb44c9ee
0.3.1 release community (#26872)
Release for 0.3.1

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-09-25 19:38:53 +00:00
Erick Friis
9a31ad6f60
langchain: release 0.3.1 (#26868) 2024-09-25 11:43:54 -07:00
Erick Friis
ef2ab26113
core: release 0.3.6 (#26863) 2024-09-25 11:05:53 -07:00
Bagatur
eaffa92c1d
openai[patch]: Release 0.2.1 (#26858) 2024-09-25 15:55:49 +00:00
Rajendra Kadam
51c4393298
community[patch]: Fix validation error in SettingsConfigDict across multiple Langchain modules (#26852)
- **Description:** This pull request addresses the validation error in
`SettingsConfigDict` due to extra fields in the `.env` file. The issue
is prevalent across multiple Langchain modules. This fix ensures that
extra fields in the `.env` file are ignored, preventing validation
errors.
  **Changes include:**
    - Applied fixes to modules using `SettingsConfigDict`.

- **Issue:** NA, similar
https://github.com/langchain-ai/langchain/issues/26850
- **Dependencies:** NA
2024-09-25 10:02:14 -04:00
Christophe Bornet
3a1b9259a7
core: Add ruff rules for comprehensions (C4) (#26829) 2024-09-25 09:34:17 -04:00
Rajendra Kadam
7e5a9c317f
community[minor]: [Pebblo] Enhance PebbloSafeLoader to take anonymize flag (#26812)
- **Description:** The flag is named `anonymize_snippets`. When set to
true, the Pebblo server will anonymize snippets by redacting all
personally identifiable information (PII) from the snippets going into
VectorDB and the generated reports
- **Issue:** NA
- **Dependencies:** NA
- **docs**: Updated
2024-09-25 09:33:06 -04:00
Rajendra Kadam
92003b3724
community[patch]: [SharePointLoader] Fix validation error in _O365Settings due to extra fields in .env file (#26851)
**Description:** Fix validation error in _O365Settings by ignoring extra
fields in .env file
**Issue:** https://github.com/langchain-ai/langchain/issues/26850
**Dependencies:** NA
2024-09-25 09:31:59 -04:00
Subhrajyoty Roy
b61fb98466
community[patch]: callback before yield for friendli (#26842)
**Description:** Moves yield to after callback for `_stream` and
`_astream` function for the friendli model in the community package
**Issue:** #16913
2024-09-25 09:31:12 -04:00
ccurme
13acf9e6b0
langchain[patch]: add deprecation warnings (#26853) 2024-09-25 09:26:44 -04:00
William FH
82b5b77940
[Core] Add more interops tests (#26841)
To test that the client propagates both ways
2024-09-24 20:18:20 -07:00
William FH
9b6ac41442
[Core] Inherit tracing metadata & tags (#26838) 2024-09-24 19:33:12 -07:00
Erick Friis
425c0f381f
experimental: release 0.3.1 (#26830) 2024-09-24 15:03:05 -07:00
John
6c3ea262c8
partners/unstructured: release 0.1.5 (#26831)
**Description:** update package version to support loading URLs #26670
**Issue:**  #26697
2024-09-24 15:02:53 -07:00
mercyspirit
0414be4b80
experimental[major]: CVE-2024-46946 fix (#26783)
Description: Resolve CVE-2024-46946 by switching out sympify with
parse_expr with a very specific allowed set of operations.

https://nvd.nist.gov/vuln/detail/cve-2024-46946

Sympify uses eval which makes it vulnerable to code execution.
parse_expr is limited to specific expressions.

Bandit results

![image](https://github.com/user-attachments/assets/170a6376-7028-4e70-a7ef-9acfb49c1d8a)

---------

Co-authored-by: aqiu7 <aqiu7@gatech.edu>
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-09-24 21:37:56 +00:00
Subhrajyoty Roy
b1da532522
community[patch]: callback before yield for deepsparse llm (#26822)
**Description:** Moves yield to after callback for `_stream` and
`_astream` function for the deepsparse model in the community package
**Issue:** #16913
2024-09-24 13:55:52 -04:00
Nuno Campos
de70a64e3a
core: Run LangChainTracer inline (#26797)
- this flag ensures the tracer always runs in the same thread as the run
being traced for both sync and async runs
- pro: less chance for ordering bugs and other oddities
- blocking the event loop is not a concern given all code in the tracer
holds the GIL anyway
2024-09-24 08:31:18 -07:00
Jorge Piedrahita Ortiz
408a930d55
community: Add Sambanova Cloud Chat model community integration (#26333)
**Description:** : Add SambaNova Cloud Chat model community integration
Includes 
- chat model integration (following Standardize ChatModel docstrings)
-  tests
- docs usage notebook (following Standardize ChatModel integration docs)

https://cloud.sambanova.ai/

---------

Co-authored-by: luisfucros <luisfucros@gmail.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
2024-09-24 14:11:32 +00:00
Tom
2b83c7c3ab
community[patch]: Fix tool_calls parsing when streaming from DeepInfra (#26813)
- **Description:** This PR fixes the response parsing logic for
`ChatDeepInfra`, more specifially `_convert_delta_to_message_chunk()`,
which is invoked when streaming via `ChatDeepInfra`.
- **Issue:** Streaming from DeepInfra via `ChatDeepInfra` is currently
broken because the response parsing logic doesn't handle that
`tool_calls` can be `None`. (There is no GitHub issue for this problem
yet.)
- **Dependencies:** –
- **Twitter handle:** –

Keeping this here as a reminder:
> If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2024-09-24 13:47:36 +00:00
Subhrajyoty Roy
997d95c8f8
community[patch]: callback before yield for bedrock llm (#26804)
**Description:** Moves yield to after callback for
`_prepare_input_and_invoke_stream` and
`_aprepare_input_and_invoke_stream` for bedrock llm in community
package.
**Issue:** #16913
2024-09-24 12:14:59 +00:00
ccurme
2a4c5713cd
openai[patch]: fix azure integration tests (#26791) 2024-09-23 17:49:15 -04:00
Mohammad Mohtashim
154a5ff7ca
core[patch]: On Chain Start Fix for Chain Class (#26593)
- **Issue:** #26588

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-09-23 19:30:59 +00:00
ccurme
bba7af903b
core[patch]: set default on Blob (#26787)
Resolves https://github.com/langchain-ai/langchain/issues/26781
2024-09-23 18:55:56 +00:00
ccurme
97b27f0930
langchain[patch]: fix extended tests (#26788)
Broken by addition of `disabled_params`
2024-09-23 18:52:09 +00:00
Bagatur
e1e4f88b3e
openai[patch]: enable Azure structured output, parallel_tool_calls=Fa… (#26599)
…lse, tool_choice=required

response_format=json_schema, tool_choice=required, parallel_tool_calls
are all supported for gpt-4o on azure.
2024-09-22 22:25:22 -07:00
Gabriel Altay
bb40a0fb32
Remove pydantic restricted namespaces from HuggingFaceInferenceAPIEmbedings (#26744)
without this `model_config` importing this package produces warnings
about "model_name" having conflicts with protected namespace "model_".

Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-09-22 08:05:37 -04:00
Gor Hayrapetyan
f97ac92f00
community[patch]: Handle empty PR body in get_pull_request in Github utility (#26739)
**Description:**
When PR body is empty `get_pull_request` method fails with bellow
exception.


**Issue:**
```
TypeError('expected string or buffer')Traceback (most recent call last):


  File ".../.venv/lib/python3.9/site-packages/langchain_core/tools/base.py", line 661, in run
    response = context.run(self._run, *tool_args, **tool_kwargs)


  File ".../.venv/lib/python3.9/site-packages/langchain_community/tools/github/tool.py", line 52, in _run
    return self.api_wrapper.run(self.mode, query)


  File ".../.venv/lib/python3.9/site-packages/langchain_community/utilities/github.py", line 816, in run
    return json.dumps(self.get_pull_request(int(query)))


  File ".../.venv/lib/python3.9/site-packages/langchain_community/utilities/github.py", line 495, in get_pull_request
    add_to_dict(response_dict, "body", pull.body)


  File ".../.venv/lib/python3.9/site-packages/langchain_community/utilities/github.py", line 487, in add_to_dict
    tokens = get_tokens(value)


  File ".../.venv/lib/python3.9/site-packages/langchain_community/utilities/github.py", line 483, in get_tokens
    return len(tiktoken.get_encoding("cl100k_base").encode(text))


  File "....venv/lib/python3.9/site-packages/tiktoken/core.py", line 116, in encode
    if match := _special_token_regex(disallowed_special).search(text):


TypeError: expected string or buffer
```

**Twitter:**  __gorros__
2024-09-22 01:56:24 +00:00
Erick Friis
238a31bbd9
core: release 0.3.5 (#26737) 2024-09-21 00:26:39 +00:00
William FH
55af6fbd02
[LangChainTracer] Omit Chunk (#26602)
in events / new llm token
2024-09-20 17:10:34 -07:00
Anton Dubovik
3e2cb4e8a4
openai: embeddings: supported chunk_size when check_embedding_ctx_length is disabled (#23767)
Chunking of the input array controlled by `self.chunk_size` is being
ignored when `self.check_embedding_ctx_length` is disabled. Effectively,
the chunk size is assumed to be equal 1 in such a case. This is
suprising.

The PR takes into account `self.chunk_size` passed by the user.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-09-20 16:58:45 -07:00
William FH
864020e592
[Tracer] add project name to run from tracer (#26736) 2024-09-20 16:48:37 -07:00
Nithish Raghunandanan
2d21274bf6
couchbase: Add ttl support to caches & chat_message_history (#26214)
**Description:** Add support to delete documents automatically from the
caches & chat message history by adding a new optional parameter, `ttl`.


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

---------

Co-authored-by: Nithish Raghunandanan <nithishr@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-09-20 23:44:29 +00:00
Krishna Kulkarni
c6c508ee96
Refining Skip Count Calculation by Filtering Documents with session_id (#26020)
In the previous implementation, `skip_count` was counting all the
documents in the collection. Instead, we want to filter the documents by
`session_id` and calculate `skip_count` by subtracting `history_size`
from the filtered count.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-09-20 23:40:56 +00:00
Tibor Reiss
a8b24135a2
fix[experimental]: Fix text splitter with gradient (#26629)
Fixes #26221

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-09-20 23:35:50 +00:00
Alejandro Rodríguez
4ac9a6f52c
core: fix "template" not allowed as prompt param (#26060)
- **Description:**  fix "template" not allowed as prompt param
- **Issue:** #26058
- **Dependencies:** none


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-09-20 23:33:06 +00:00
Christophe Bornet
58f339a67c
community: Fix links in GraphVectorStore pydoc (#25959)
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-09-20 23:17:53 +00:00
Christophe Bornet
e49c413977
core: Add docstring for GraphVectorStoreRetriever (#26224)
Co-authored-by: Erick Friis <erickfriis@gmail.com>
2024-09-20 23:16:37 +00:00
Lucain
a2023a1e96
huggingface; fix huggingface_endpoint.py (initialize clients only with supported kwargs) (#26378)
## Description

By default, `HuggingFaceEndpoint` instantiates both the
`InferenceClient` and the `AsyncInferenceClient` with the
`"server_kwargs"` passed as input. This is an issue as both clients
might not support exactly the same kwargs. This has been highlighted in
https://github.com/huggingface/huggingface_hub/issues/2522 by
@morgandiverrez with the `trust_env` parameter. In order to make
`langchain` integration future-proof, I do think it's wiser to forward
only the supported parameters to each client. Parameters that are not
supported are simply ignored with a warning to the user. From a
`huggingface_hub` maintenance perspective, this allows us much more
flexibility as we are not constrained to support the exact same kwargs
in both clients.

## Issue

https://github.com/huggingface/huggingface_hub/issues/2522

## Dependencies

None

## Twitter 

https://x.com/Wauplin

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-09-20 16:05:24 -07:00
ccurme
f2285376a5
community[patch]: add web loader tests (#26728) 2024-09-20 18:29:54 -04:00
Erick Friis
4a2745064a
core: release 0.3.4 (#26729) 2024-09-20 14:47:15 -07:00
Nuno Campos
345edeb1f0
core: In astream_events propagate cancellation reason to inner task (#26727)
Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2024-09-20 14:42:10 -07:00