[HotFix] update load lora model Readme; (#6240)

* [fix] update load lora model Readme; * [fix] update lora infer readme * [fix] remove useless comments
2025-09-10 13:30:19 +00:00 · 2025-03-07 14:14:26 +08:00
parent 6d676ee0e9
commit 44d4053fec
1 changed files with 57 additions and 0 deletions
--- a/applications/ColossalChat/examples/README.md
+++ b/applications/ColossalChat/examples/README.md
@@ -892,6 +892,63 @@ The dialogues can by multiple turns and it can contain system prompt. For more d
 We use bf16 weights for finetuning. If you downloaded fp8 DeepSeek V3/R1 weights, you can use the [script](https://github.com/deepseek-ai/DeepSeek-V3/blob/main/inference/fp8_cast_bf16.py) to convert the weights to bf16 via GPU. For Ascend NPU, you can use this [script](https://gitee.com/ascend/ModelZoo-PyTorch/blob/master/MindIE/LLM/DeepSeek/DeepSeek-V2/NPU_inference/fp8_cast_bf16.py).
 We have also added details on how to load and reason with lora models.
 ```python
 from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
 )
 from peft import (
    PeftModel
 )
 import torch
 # Set model path
 model_name = "Qwen/Qwen2.5-3B"
 lora_adapter = "Qwen2.5-3B_lora" # Your lora model Path
 merged_model_path = "Qwen2.5-3B_merged"
 ######
 # How to Load lora Model
 ######
 # 1.Load base model
 base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
 )
 # 2.Load lora model
 peft_model = PeftModel.from_pretrained(
    base_model,
    lora_adapter,
    torch_dtype=torch.bfloat16
 )
 # 3.Merge lora model
 merged_model = peft_model.merge_and_unload()
 # 4.Load tokenizer
 tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    trust_remote_code=True,
    pad_token="<|endoftext|>"
 )
 # 5.Save merged lora model
 merged_model.save_pretrained(
    merged_model_path,
    safe_serialization=True
 )
 tokenizer.save_pretrained(merged_model_path)
 # 6.Run Inference
 test_input = tokenizer("Instruction: Finding prime numbers up to 100\nAnswer:", return_tensors="pt").to("cuda")
 output = merged_model.generate(**test_input, max_new_tokens=100)
 print(tokenizer.decode(output[0], skip_special_tokens=True))
 ```
 #### Usage
 After preparing the dataset and model weights, you can run the script with the following command: