通义千问Qwen微调量化实战

原创

IT大头 2024-06-06 10:37:10 博主文章分类：大模型微调实战 ©著作权

文章标签 人工智能自然语言处理 chatgpt llama pytorch 文章分类 JavaScript 前端开发

©著作权归作者所有：来自51CTO博客作者IT大头的原创作品，请联系作者获取转载授权，否则将追究法律责任

前言

本文主要内容是对于Qwen量化实战演练，将深入探讨两种不同的量化方法：分别是使用官方量化后的int4模型进行微调，得到模型理论上也是量化后的微调模型，另一种则是使用官方全量模型进行微调，再将微调后的模型进行自主量化。

ps:作者认为如果真实场景使用的话建议使用第一种，第二只是用于了解即可，而且自己去量化模型bug百出，费时费力

一、基于官方量化模型的qlora微调量化

该方法是在官方的微调模型基础上进行的由于前面流程和lora微调相似，准备数据及环境准备，请参考我的lora微调的文章，后续操作就是拉取量化代码

git clone https://modelscope.cn/qwen/Qwen-7B-Chat-Int4.git

修改配置：

export CUDA_DEVICE_MAX_CONNECTIONS=1 
export CUDA_VISIBLE_DEVICES=0

模型微调：

python finetune.py \
  --model_name_or_path Qwen-7B-Chat-Int4 \
  --data_path chat.json \
  --fp16 True \
  --output_dir output_qwen \
  --num_train_epochs 5 \
  --per_device_train_batch_size 2 \
  --per_device_eval_batch_size 1 \
  --gradient_accumulation_steps 8 \
  --evaluation_strategy "no" \
  --save_strategy "steps" \
  --save_steps 1000 \
  --save_total_limit 10 \
  --learning_rate 3e-4 \
  --weight_decay 0.1 \
  --adam_beta2 0.95 \
  --warmup_ratio 0.01 \
  --lr_scheduler_type "cosine" \
  --logging_steps 1 \
  --report_to "none" \
  --model_max_length 512 \
  --lazy_preprocess True \
  --gradient_checkpointing \
  --use_lora \
  --q_lora \
  --deepspeed finetune/ds_config_zero2.json

这里相对与lora微调的参数多加载了官方的配置文件

后续测试还是老代码：

from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer

model = AutoPeftModelForCausalLM.from_pretrained("output_qwen", device_map="auto", trust_remote_code=True).eval（)
tokenizer = AutoTokenizer.from_pretrained("output_qwen", trust_remote_code=True)
response, history = model.chat(tokenizer, "", history=None)

二、基于官方全量模型的AutoGPTQ量化

环境：

pip install auto-gptq optimum

在官方的git下面有个文件叫run_gptq.py的文件是用来做量化的

运行：

python run_gptq.py --model_name_or_path Qwen-7B-Chat --data_path chat.json --out_path output_qwen

生成的文件夹中缺失一些文件：

modeling_qwen.py
qwen_generation_utils.py
cpp_kernels.py
generation_config.json

缺失的去官方的模型文件复制过来即可，下面是完整文件

通义千问Qwen微调量化实战_人工智能

测试：

from modelscope import AutoTokenizer, AutoModelForCausalLM

# Note: The default behavior now has injection attack prevention off.
tokenizer = AutoTokenizer.from_pretrained("output_qwen", trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    "output_qwen",
    device_map="auto",
    trust_remote_code=True
).eval（)
response, history = model.chat(tokenizer, "你好", history=None)
print(response)

ps:我在测试时发现回答的是乱码，不知道是不是有什么地方没运行，目前不知道原因，等待官方解答中，有人说是依赖包的版本问题，具体未知，有人成功了请给我解答疑惑