community[patch]: Fix vLLM integration to apply lora_request (#27731)

**Description:**
- Add the `lora_request` parameter to the VLLM class to support LoRA
model configurations. This enhancement allows users to specify LoRA
requests directly when using VLLM, enabling more flexible and efficient
model customization.

**Issue:**
- No existing issue for `lora_adapter` in VLLM. This PR addresses the
need for configuring LoRA requests within the VLLM framework.
- Reference : [Using LoRA Adapters in
vLLM](https://docs.vllm.ai/en/stable/models/lora.html#using-lora-adapters)


**Example Code :**
Before this change, the `lora_request` parameter was not applied
correctly:

```python
ADAPTER_PATH = "/path/of/lora_adapter"

llm = VLLM(model="Bllossom/llama-3.2-Korean-Bllossom-3B",
           max_new_tokens=512,
           top_k=2,
           top_p=0.90,
           temperature=0.1,
           vllm_kwargs={
               "gpu_memory_utilization":0.5, 
               "enable_lora":True, 
               "max_model_len":1024,
           }
)

print(llm.invoke(
    ["...prompt_content..."], 
    lora_request=LoRARequest("lora_adapter", 1, ADAPTER_PATH)
    ))
```
**Before Change Output:**
```bash
response was not applied lora_request
```
So, I attempted to apply the lora_adapter to
langchain_community.llms.vllm.VLLM.

**current output:**
```bash
response applied lora_request
```

**Dependencies:**
- None

**Lint and test:**
- All tests and lint checks have passed.

---------

Co-authored-by: Um Changyong <changyong.um@sfa.co.kr>
This commit is contained in:
Changyong Um 2024-10-30 22:59:34 +09:00 committed by GitHub
parent 9d2f6701e1
commit dc171221b3
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -125,6 +125,8 @@ class VLLM(BaseLLM):
"""Run the LLM on the given prompt and input."""
from vllm import SamplingParams
lora_request = kwargs.pop("lora_request", None)
# build sampling parameters
params = {**self._default_params, **kwargs, "stop": stop}
@ -135,7 +137,12 @@ class VLLM(BaseLLM):
)
# call the model
outputs = self.client.generate(prompts, sample_params)
if lora_request:
outputs = self.client.generate(
prompts, sample_params, lora_request=lora_request
)
else:
outputs = self.client.generate(prompts, sample_params)
generations = []
for output in outputs: