mirror of
https://github.com/hwchase17/langchain.git
synced 2025-08-07 12:06:43 +00:00
community[patch]: Fix vLLM integration to apply lora_request (#27731)
**Description:** - Add the `lora_request` parameter to the VLLM class to support LoRA model configurations. This enhancement allows users to specify LoRA requests directly when using VLLM, enabling more flexible and efficient model customization. **Issue:** - No existing issue for `lora_adapter` in VLLM. This PR addresses the need for configuring LoRA requests within the VLLM framework. - Reference : [Using LoRA Adapters in vLLM](https://docs.vllm.ai/en/stable/models/lora.html#using-lora-adapters) **Example Code :** Before this change, the `lora_request` parameter was not applied correctly: ```python ADAPTER_PATH = "/path/of/lora_adapter" llm = VLLM(model="Bllossom/llama-3.2-Korean-Bllossom-3B", max_new_tokens=512, top_k=2, top_p=0.90, temperature=0.1, vllm_kwargs={ "gpu_memory_utilization":0.5, "enable_lora":True, "max_model_len":1024, } ) print(llm.invoke( ["...prompt_content..."], lora_request=LoRARequest("lora_adapter", 1, ADAPTER_PATH) )) ``` **Before Change Output:** ```bash response was not applied lora_request ``` So, I attempted to apply the lora_adapter to langchain_community.llms.vllm.VLLM. **current output:** ```bash response applied lora_request ``` **Dependencies:** - None **Lint and test:** - All tests and lint checks have passed. --------- Co-authored-by: Um Changyong <changyong.um@sfa.co.kr>
This commit is contained in:
parent
9d2f6701e1
commit
dc171221b3
@ -125,6 +125,8 @@ class VLLM(BaseLLM):
|
|||||||
"""Run the LLM on the given prompt and input."""
|
"""Run the LLM on the given prompt and input."""
|
||||||
from vllm import SamplingParams
|
from vllm import SamplingParams
|
||||||
|
|
||||||
|
lora_request = kwargs.pop("lora_request", None)
|
||||||
|
|
||||||
# build sampling parameters
|
# build sampling parameters
|
||||||
params = {**self._default_params, **kwargs, "stop": stop}
|
params = {**self._default_params, **kwargs, "stop": stop}
|
||||||
|
|
||||||
@ -135,7 +137,12 @@ class VLLM(BaseLLM):
|
|||||||
)
|
)
|
||||||
|
|
||||||
# call the model
|
# call the model
|
||||||
outputs = self.client.generate(prompts, sample_params)
|
if lora_request:
|
||||||
|
outputs = self.client.generate(
|
||||||
|
prompts, sample_params, lora_request=lora_request
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
outputs = self.client.generate(prompts, sample_params)
|
||||||
|
|
||||||
generations = []
|
generations = []
|
||||||
for output in outputs:
|
for output in outputs:
|
||||||
|
Loading…
Reference in New Issue
Block a user