任务目标

可图Kolors-LoRA风格故事挑战赛_创新应用大赛_天池大赛

  1. 参赛者需在可图Kolors 模型的基础上训练LoRA 模型,生成无限风格,如水墨画风格、水彩风格、赛博朋克风格、日漫风格…
  2. 基于LoRA模型生成 8 张图片组成连贯故事,故事内容可自定义;基于8图故事,评估LoRA风格的美感度及连贯性

实践过程

开通阿里云PAI-DSW试用在教程中已经说明的很详细了。

在教程中有在魔搭社区创建PAI实例的详细过程。

而这里本文除了选择使用PAI实例外,是尝试在本地进行更多可能。

git lfs install
git clone https://www.modelscope.cn/datasets/maochase/kolors.git

进入文件夹除了baseline外还有Data-Juicer和DiffSynth-Studio:

Data-Juicer:数据处理和转换工具,旨在简化数据的提取、转换和加载过程
DiffSynth-Studio:高效微调训练大模型工具
!pip install simple-aesthetics-predictor
!pip install -v -e data-juicer
!pip uninstall pytorch-lightning -y
!pip install peft lightning pandas torchvision
!pip install -e DiffSynth-Studio

下载数据集部分简单,下面是lora微调,其中部分参数可根据实际需求调整,例如 lora_rank 可以控制 LoRA 模型的参数量:

# 下载模型
from diffsynth import download_models
download_models(["Kolors", "SDXL-vae-fp16-fix"])

#模型训练
import os

cmd = """
python DiffSynth-Studio/examples/train/kolors/train_kolors_lora.py \
  --pretrained_unet_path models/kolors/Kolors/unet/diffusion_pytorch_model.safetensors \
  --pretrained_text_encoder_path models/kolors/Kolors/text_encoder \
  --pretrained_fp16_vae_path models/sdxl-vae-fp16-fix/diffusion_pytorch_model.safetensors \
  --lora_rank 16 \
  --lora_alpha 4.0 \
  --dataset_path data/lora_dataset_processed \
  --output_path ./models \
  --max_epochs 1 \
  --center_crop \
  --use_gradient_checkpointing \
  --precision "16-mixed"
""".strip()

os.system(cmd)

这里只是记录结果数据:

Loading models from: models/kolors/Kolors/unet/diffusion_pytorch_model.safetensors
    model_name: sdxl_unet model_class: SDXLUNet
        This model is initialized with extra kwargs: {'is_kolors': True}
    The following models are loaded: ['sdxl_unet'].
Loading models from: models/kolors/Kolors/text_encoder
Loading checkpoint shards: 100%|██████████| 7/7 [00:06<00:00,  1.02it/s]
/usr/local/lib/python3.8/dist-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
Using 16bit Automatic Mixed Precision (AMP)
/usr/local/lib/python3.8/dist-packages/lightning/pytorch/plugins/precision/amp.py:52: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
You are using a CUDA device ('NVIDIA A100-SXM4-80GB') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
Missing logger folder: models/lightning_logs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name | Type              | Params | Mode
--------------------------------------------------
0 | pipe | SDXLImagePipeline | 8.9 B  | eval
--------------------------------------------------
23.2 M    Trainable params
8.9 B     Non-trainable params
8.9 B     Total params
35,719.684Total estimated model params size (MB)
/usr/local/lib/python3.8/dist-packages/lightning/pytorch/trainer/connectors/data_connector.py:424: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=127` in the `DataLoader` to improve performance.
    The following models are loaded: ['kolors_text_encoder'].
Loading models from: models/sdxl-vae-fp16-fix/diffusion_pytorch_model.safetensors
    model_name: sdxl_vae_encoder model_class: SDXLVAEEncoder
        This model is initialized with extra kwargs: {'upcast_to_float32': True}
    model_name: sdxl_vae_decoder model_class: SDXLVAEDecoder
        This model is initialized with extra kwargs: {'upcast_to_float32': True}
    The following models are loaded: ['sdxl_vae_encoder', 'sdxl_vae_decoder'].
No sdxl_text_encoder models available.
No sdxl_text_encoder_2 models available.
Using kolors_text_encoder from models/kolors/Kolors/text_encoder.
Using sdxl_unet from models/kolors/Kolors/unet/diffusion_pytorch_model.safetensors.
Using sdxl_vae_decoder from models/sdxl-vae-fp16-fix/diffusion_pytorch_model.safetensors.
Using sdxl_vae_encoder from models/sdxl-vae-fp16-fix/diffusion_pytorch_model.safetensors.
No sdxl_ipadapter models available.
No sdxl_ipadapter_clip_image_encoder models available.
Switch to Kolors. The prompter and scheduler will be replaced.
Epoch 0: 100%|██████████| 500/500 [08:06<00:00,  1.03it/s, v_num=0, train_loss=0.314]   
/usr/local/lib/python3.8/dist-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead.
  with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context:  # type: ignore[attr-defined]
`Trainer.fit` stopped: `max_epochs=1` reached.
Epoch 0: 100%|██████████| 500/500 [08:07<00:00,  1.03it/s, v_num=0, train_loss=0.314]
0

此时,运行占用23.80GB内存,和20GB显存。

下面查看训练脚本的输入参数:

!python3 DiffSynth-Studio/examples/train/kolors/train_kolors_lora.py -h
usage: train_kolors_lora.py [-h] --pretrained_unet_path PRETRAINED_UNET_PATH
                            --pretrained_text_encoder_path
                            PRETRAINED_TEXT_ENCODER_PATH
                            --pretrained_fp16_vae_path
                            PRETRAINED_FP16_VAE_PATH
                            [--lora_target_modules LORA_TARGET_MODULES]
                            --dataset_path DATASET_PATH
                            [--output_path OUTPUT_PATH]
                            [--steps_per_epoch STEPS_PER_EPOCH]
                            [--height HEIGHT] [--width WIDTH] [--center_crop]
                            [--random_flip] [--batch_size BATCH_SIZE]
                            [--dataloader_num_workers DATALOADER_NUM_WORKERS]
                            [--precision {32,16,16-mixed}]
                            [--learning_rate LEARNING_RATE]
                            [--lora_rank LORA_RANK] [--lora_alpha LORA_ALPHA]
                            [--use_gradient_checkpointing]
                            [--accumulate_grad_batches ACCUMULATE_GRAD_BATCHES]
                            [--training_strategy {auto,deepspeed_stage_1,deepspeed_stage_2,deepspeed_stage_3}]
                            [--max_epochs MAX_EPOCHS]
                            [--modelscope_model_id MODELSCOPE_MODEL_ID]
                            [--modelscope_access_token MODELSCOPE_ACCESS_TOKEN]

Simple example of a training script.

optional arguments:
  -h, --help            show this help message and exit
  --pretrained_unet_path PRETRAINED_UNET_PATH
                        Path to pretrained model (UNet). For example, `models/
                        kolors/Kolors/unet/diffusion_pytorch_model.safetensors
                        `.
  --pretrained_text_encoder_path PRETRAINED_TEXT_ENCODER_PATH
                        Path to pretrained model (Text Encoder). For example,
                        `models/kolors/Kolors/text_encoder`.
  --pretrained_fp16_vae_path PRETRAINED_FP16_VAE_PATH
                        Path to pretrained model (VAE). For example,
                        `models/kolors/Kolors/sdxl-vae-
                        fp16-fix/diffusion_pytorch_model.safetensors`.
  --lora_target_modules LORA_TARGET_MODULES
                        Layers with LoRA modules.
  --dataset_path DATASET_PATH
                        The path of the Dataset.
  --output_path OUTPUT_PATH
                        Path to save the model.
  --steps_per_epoch STEPS_PER_EPOCH
                        Number of steps per epoch.
  --height HEIGHT       Image height.
  --width WIDTH         Image width.
  --center_crop         Whether to center crop the input images to the
                        resolution. If not set, the images will be randomly
                        cropped. The images will be resized to the resolution
                        first before cropping.
  --random_flip         Whether to randomly flip images horizontally
  --batch_size BATCH_SIZE
                        Batch size (per device) for the training dataloader.
  --dataloader_num_workers DATALOADER_NUM_WORKERS
                        Number of subprocesses to use for data loading. 0
                        means that the data will be loaded in the main
                        process.
  --precision {32,16,16-mixed}
                        Training precision
  --learning_rate LEARNING_RATE
                        Learning rate.
  --lora_rank LORA_RANK
                        The dimension of the LoRA update matrices.
  --lora_alpha LORA_ALPHA
                        The weight of the LoRA update matrices.
  --use_gradient_checkpointing
                        Whether to use gradient checkpointing.
  --accumulate_grad_batches ACCUMULATE_GRAD_BATCHES
                        The number of batches in gradient accumulation.
  --training_strategy {auto,deepspeed_stage_1,deepspeed_stage_2,deepspeed_stage_3}
                        Training strategy
  --max_epochs MAX_EPOCHS
                        Number of epochs.
  --modelscope_model_id MODELSCOPE_MODEL_ID
                        Model ID on ModelScope (https://www.modelscope.cn/).
                        The model will be uploaded to ModelScope automatically
                        if you provide a Model ID.
  --modelscope_access_token MODELSCOPE_ACCESS_TOKEN
                        Access key on ModelScope (https://www.modelscope.cn/).
                        Required if you want to upload the model to
                        ModelScope.

加载模型、生成图像:

from diffsynth import ModelManager, SDXLImagePipeline
from peft import LoraConfig, inject_adapter_in_model
import torch


def load_lora(model, lora_rank, lora_alpha, lora_path):
    lora_config = LoraConfig(
        r=lora_rank,
        lora_alpha=lora_alpha,
        init_lora_weights="gaussian",
        target_modules=["to_q", "to_k", "to_v", "to_out"],
    )
    model = inject_adapter_in_model(lora_config, model)
    state_dict = torch.load(lora_path, map_location="cpu")
    model.load_state_dict(state_dict, strict=False)
    return model


# Load models
model_manager = ModelManager(torch_dtype=torch.float16, device="cuda",
                             file_path_list=[
                                 "models/kolors/Kolors/text_encoder",
                                 "models/kolors/Kolors/unet/diffusion_pytorch_model.safetensors",
                                 "models/kolors/Kolors/vae/diffusion_pytorch_model.safetensors"
                             ])
pipe = SDXLImagePipeline.from_model_manager(model_manager)

# Load LoRA
pipe.unet = load_lora(
    pipe.unet,
    lora_rank=16, # This parameter should be consistent with that in your training script.
    lora_alpha=2.0, # lora_alpha can control the weight of LoRA.
    lora_path="models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt"
)
Loading models from: models/kolors/Kolors/text_encoder
Loading checkpoint shards:   0%|          | 0/7 [00:00<?, ?it/s]
    The following models are loaded: ['kolors_text_encoder'].
Loading models from: models/kolors/Kolors/unet/diffusion_pytorch_model.safetensors
    model_name: sdxl_unet model_class: SDXLUNet
        This model is initialized with extra kwargs: {'is_kolors': True}
    The following models are loaded: ['sdxl_unet'].
Loading models from: models/kolors/Kolors/vae/diffusion_pytorch_model.safetensors
    model_name: sdxl_vae_encoder model_class: SDXLVAEEncoder
        This model is initialized with extra kwargs: {'upcast_to_float32': True}
    model_name: sdxl_vae_decoder model_class: SDXLVAEDecoder
        This model is initialized with extra kwargs: {'upcast_to_float32': True}
    The following models are loaded: ['sdxl_vae_encoder', 'sdxl_vae_decoder'].
/root/.conda/envs/kolor/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
No sdxl_text_encoder models available.
No sdxl_text_encoder_2 models available.
Using kolors_text_encoder from models/kolors/Kolors/text_encoder.
Using sdxl_unet from models/kolors/Kolors/unet/diffusion_pytorch_model.safetensors.
Using sdxl_vae_decoder from models/kolors/Kolors/vae/diffusion_pytorch_model.safetensors.
Using sdxl_vae_encoder from models/kolors/Kolors/vae/diffusion_pytorch_model.safetensors.
No sdxl_ipadapter models available.
No sdxl_ipadapter_clip_image_encoder models available.
Switch to Kolors. The prompter and scheduler will be replaced.
/tmp/ipykernel_124608/1083844865.py:14: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  state_dict = torch.load(lora_path, map_location="cpu")

开始生成图像:

torch.manual_seed(0)
image = pipe(
    prompt="二次元,一个紫色短发小女孩,在家中沙发上坐着,双手托着腮,很无聊,全身,粉色连衣裙",
    negative_prompt="丑陋、变形、嘈杂、模糊、低对比度",
    cfg_scale=4,
    num_inference_steps=50, height=1024, width=1024,
)
image.save("1.jpg")

此时,占用35.15GB内存,25GB显存。

需要注意的是,前面会有个 restart 的过程,不然显存不够会导致报错。

Datawhale AI夏令营第四期魔搭-AIGC文生图方向Task1笔记_AI

得到的总共图像有:

Datawhale AI夏令营第四期魔搭-AIGC文生图方向Task1笔记_AIGC_02

精读学习

Datawhale AI夏令营第四期魔搭-AIGC文生图方向Task1笔记_AIGC_03

将baseline中的所有代码整理出来,来分析代码的主体架构。

  1. 导入库:代码导入需要用到的库,包括 data-juicer 和微调的工具 DiffSynth-Studio
  2. 数据集构建:下载数据集kolors,处理数据集
  3. 模型微调:模型微调训练,以及加载训练后的模型
  4. 图片生成:调用训练好的模型生成图片
# 安装 Data-Juicer 和 DiffSynth-Studio
!pip install simple-aesthetics-predictor # 安装simple-aesthetics-predictor
!pip install -v -e data-juicer # 安装data-juicer
!pip uninstall pytorch-lightning -y # 卸载pytorch-lightning
!pip install peft lightning pandas torchvision # 安装 peft lightning pandas torchvision
!pip install -e DiffSynth-Studio # 安装DiffSynth-Studio

# 从魔搭数据集中下载数据集AI-ModelScope/lowres_anime
from modelscope.msdatasets import MsDataset  #引入数据集模块msdatasets
ds = MsDataset.load(
    'AI-ModelScope/lowres_anime',
    subset_name='default',
    split='train',
    cache_dir="/mnt/workspace/kolors/data" # 指定缓存目录
) # 从魔搭数据集中下载数据集AI-ModelScope/lowres_anime,赋值给参数ds

# 生成数据集
import json, os # 导入json和os模块
from data_juicer.utils.mm_utils import SpecialTokens # 导入SpecialTokens
from tqdm import tqdm # 导入tqdm进度条管理
os.makedirs("./data/lora_dataset/train", exist_ok=True) # 创建文件夹./data/lora_dataset/train
os.makedirs("./data/data-juicer/input", exist_ok=True) # 创建文件夹./data/data-juicer/input
with open("./data/data-juicer/input/metadata.jsonl", "w") as f:
    for data_id, data in enumerate(tqdm(ds)): # 遍历数据集ds
        image = data["image"].convert("RGB") # 将数据集的图片转换为RGB
        image.save(f"/mnt/workspace/kolors/data/lora_dataset/train/{data_id}.jpg") # 保存数据集的图片
        metadata = {"text": "二次元", "image": [f"/mnt/workspace/kolors/data/lora_dataset/train/{data_id}.jpg"]} # 生成当前图片的索引数据
        f.write(json.dumps(metadata)) # 将索引数据写入文件./data/data-juicer/input/metadata.jsonl
        f.write("\n")

# 配置data-juicer,并进行数据筛选过滤
# 配置过滤的规则
data_juicer_config = """
# global parameters
project_name: 'data-process' # 名称
dataset_path: './data/data-juicer/input/metadata.jsonl'  # 你前面生成的数据的索引文件
np: 4  # 线程数

text_keys: 'text' # 文件./data/data-juicer/input/metadata.jsonl的描述的字段名
image_key: 'image' # 文件./data/data-juicer/input/metadata.jsonl的图片字段名
image_special_token: '<__dj__image>'

export_path: './data/data-juicer/output/result.jsonl' # 筛选通过的图片结果保存的的索引文件

# process schedule
# a list of several process operators with their arguments
# 过滤的规则
process:
    - image_shape_filter: # 图片尺寸过滤
        min_width: 1024 # 最小宽度1024
        min_height: 1024 # 最小高度1024
        any_or_all: any # 符合前面条件的图片才会被保留
    - image_aspect_ratio_filter: # 图片长宽比过滤
        min_ratio: 0.5 # 最小长宽比0.5
        max_ratio: 2.0 # 最大长宽比2.0
        any_or_all: any # 符合前面条件的图片才会被保留
"""

# 保存data-juicer配置到data/data-juicer/data_juicer_config.yaml
with open("data/data-juicer/data_juicer_config.yaml", "w") as file:
    file.write(data_juicer_config.strip())
# data-juicer开始执行数据筛选
!dj-process --config data/data-juicer/data_juicer_config.yaml


# 通过前面通过data-juicer筛选的图片索引信息./data/data-juicer/output/result.jsonl,生成数据集
import pandas as pd # 导入pandas
import os, json # 导入os和json
from PIL import Image # 导入Image
from tqdm import tqdm # 导入tqdm进度条管理
texts, file_names = [], [] # 定义两个空列表,分别存储图片描述和图片名称
os.makedirs("./data/lora_dataset_processed/train", exist_ok=True) # 创建文件夹./data/lora_dataset_processed/train
with open("./data/data-juicer/output/result.jsonl", "r") as file: # 打开前面data-juicer筛选的图片索引文件./data/data-juicer/output/result.jsonl
    for data_id, data in enumerate(tqdm(file.readlines())): # 遍历文件./data/data-juicer/output/result.jsonl
        data = json.loads(data) # 将json字符串转换为对象
        text = data["text"] # 获取对象中的text属性,也就是图片的描述信息
        texts.append(text) # 将图片的描述信息添加到texts列表中
        image = Image.open(data["image"][0]) # 获取对象中的image属性,也就是图片的路径,然后用这个路径打开图片
        image_path = f"./data/lora_dataset_processed/train/{data_id}.jpg" # 生成保存图片的路径
        image.save(image_path) # 将图片保存到./data/lora_dataset_processed/train文件夹中
        file_names.append(f"{data_id}.jpg") # 将图片名称添加到file_names列表中
data_frame = pd.DataFrame() # 创建空的DataFrame
data_frame["file_name"] = file_names # 将图片名称添加到data_frame中
data_frame["text"] = texts # 将图片描述添加到data_frame中
data_frame.to_csv("./data/lora_dataset_processed/train/metadata.csv", index=False, encoding="utf-8-sig") # 将data_frame保存到./data/lora_dataset_processed/train/metadata.csv
data_frame # 查看data_frame


# 下载可图模型
from diffsynth import download_models # 导入download_models
download_models(["Kolors", "SDXL-vae-fp16-fix"]) # 下载可图模型
# DiffSynth-Studio提供了可图的Lora训练脚本,查看脚本信息
!python DiffSynth-Studio/examples/train/kolors/train_kolors_lora.py -h


# 执行可图Lora训练
import os
cmd = """
python DiffSynth-Studio/examples/train/kolors/train_kolors_lora.py \ # 选择使用可图的Lora训练脚本DiffSynth-Studio/examples/train/kolors/train_kolors_lora.py
  --pretrained_unet_path models/kolors/Kolors/unet/diffusion_pytorch_model.safetensors \ # 选择unet模型
  --pretrained_text_encoder_path models/kolors/Kolors/text_encoder \ # 选择text_encoder
  --pretrained_fp16_vae_path models/sdxl-vae-fp16-fix/diffusion_pytorch_model.safetensors \ # 选择vae模型
  --lora_rank 16 \ # lora_rank 16 表示在权衡模型表达能力和训练效率时,选择了使用 16 作为秩,适合在不显著降低模型性能的前提下,通过 LoRA 减少计算和内存的需求
  --lora_alpha 4.0 \ # 设置 LoRA 的 alpha 值,影响调整的强度
  --dataset_path data/lora_dataset_processed \ # 指定数据集路径,用于训练模型
  --output_path ./models \ # 指定输出路径,用于保存模型
  --max_epochs 1 \ # 设置最大训练轮数为 1
  --center_crop \ # 启用中心裁剪,这通常用于图像预处理
  --use_gradient_checkpointing \ # 启用梯度检查点技术,以节省内存
  --precision "16-mixed" # 指定训练时的精度为混合 16 位精度(half precision),这可以加速训练并减少显存使用
""".strip()
os.system(cmd) # 执行可图Lora训练


# 加载lora微调后的模型
from diffsynth import ModelManager, SDXLImagePipeline # 导入ModelManager和SDXLImagePipeline
from peft import LoraConfig, inject_adapter_in_model # 导入LoraConfig和inject_adapter_in_model
import torch # 导入torch
# 加载LoRA配置并注入模型
def load_lora(model, lora_rank, lora_alpha, lora_path):
    lora_config = LoraConfig(
        r=lora_rank, # 设置LoRA的秩(rank)
        lora_alpha=lora_alpha, # 设置LoRA的alpha值,控制LoRA的影响权重
        init_lora_weights="gaussian", # 初始化LoRA权重为高斯分布
        target_modules=["to_q", "to_k", "to_v", "to_out"], # 指定要应用LoRA的模块
    )
    model = inject_adapter_in_model(lora_config, model) # 将LoRA配置注入到模型中
    state_dict = torch.load(lora_path, map_location="cpu") # 加载LoRA微调后的权重
    model.load_state_dict(state_dict, strict=False) # 将权重加载到模型中,允许部分权重不匹配
    return model # 返回注入LoRA后的模型
# 加载预训练模型
model_manager = ModelManager(
    torch_dtype=torch.float16, # 设置模型的数据类型为float16,减少显存占用
    device="cuda", # 指定使用GPU进行计算
    file_path_list=[
        "models/kolors/Kolors/text_encoder", # 文本编码器的路径
        "models/kolors/Kolors/unet/diffusion_pytorch_model.safetensors", # UNet模型的路径
        "models/kolors/Kolors/vae/diffusion_pytorch_model.safetensors" # VAE模型的路径
    ]
)
# 初始化图像生成管道
pipe = SDXLImagePipeline.from_model_manager(model_manager) # 从模型管理器中加载模型并初始化管道
# 加载并应用LoRA权重到UNet模型
pipe.unet = load_lora(
    pipe.unet, 
    lora_rank=16, # 设置LoRA的秩(rank),与训练脚本中的参数保持一致
    lora_alpha=2.0, # 设置LoRA的alpha值,控制LoRA对模型的影响权重
    lora_path="models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt" # 指定LoRA权重的文件路径
)


# 生成图像
torch.manual_seed(0) # 设置随机种子,确保生成的图像具有可重复性。如果想要每次生成不同的图像,可以将种子值改为随机值。
image = pipe(
    prompt="二次元,一个紫色短发小女孩,在家中沙发上坐着,双手托着腮,很无聊,全身,粉色连衣裙", # 设置正向提示词,用于指导模型生成图像的内容
    negative_prompt="丑陋、变形、嘈杂、模糊、低对比度", # 设置负向提示词,模型会避免生成包含这些特征的图像
    cfg_scale=4, # 设置分类自由度 (Classifier-Free Guidance) 的比例,数值越高,模型越严格地遵循提示词
    num_inference_steps=50, # 设置推理步数,步数越多,生成的图像细节越丰富,但生成时间也更长
    height=1024, width=1024, # 设置生成图像的高度和宽度,这里生成 1024x1024 像素的图像
)
image.save("1.jpg") # 将生成的图像保存为 "1.jpg" 文件


# 图像拼接,展示总体拼接大图
import numpy as np  # 导入numpy库,用于处理数组和数值计算
from PIL import Image  # 导入PIL库中的Image模块,用于图像处理
images = [np.array(Image.open(f"{i}.jpg")) for i in range(1, 9)]  # 读取1.jpg到8.jpg的图像,转换为numpy数组,并存储在列表images中
image = np.concatenate([  # 将四组图像在垂直方向上拼接
    np.concatenate(images[0:2], axis=1),  # 将第1组(images[0:2])的两张图像在水平方向上拼接
    np.concatenate(images[2:4], axis=1),  # 将第2组(images[2:4])的两张图像在水平方向上拼接
    np.concatenate(images[4:6], axis=1),  # 将第3组(images[4:6])的两张图像在水平方向上拼接
    np.concatenate(images[6:8], axis=1),  # 将第4组(images[6:8])的两张图像在水平方向上拼接
], axis=0)  # 将四组拼接后的图像在垂直方向上拼接
image = Image.fromarray(image).resize((1024, 2048))  # 将拼接后的numpy数组转换为图像对象,并调整大小为1024x2048像素
image  # 输出最终生成的图像对象,用于显示图像

Lora

LoRA (Low-Rank Adaptation) 微调是一种用于在预训练模型上进行高效微调的技术。它可以通过高效且灵活的方式实现模型的个性化调整,使其能够适应特定的任务或领域,同时保持良好的泛化能力和较低的资源消耗。这对于推动大规模预训练模型的实际应用至关重要。

LoRA通过在预训练模型的关键层中添加低秩矩阵来实现。这些低秩矩阵通常被设计成具有较低维度的参数空间,这样它们就可以在不改变模型整体结构的情况下进行微调。在训练过程中,只有这些新增的低秩矩阵被更新,而原始模型的大部分权重保持不变。


附加说明

注意的是,在使用 data-juicer 处理数据时,我这里说结合终端进行操作,在data_juicer_config.yaml的设置得到./data/data-juicer/output/result.jsonl

from modelscope.msdatasets import MsDataset

ds = MsDataset.load(
    'AI-ModelScope/lowres_anime',
    subset_name='default',
    split='train',
    cache_dir="/root/k2/AIGC/kolors/data"
)

##########使用 data-juicer 处理数据
data_juicer_config = """
# global parameters
project_name: 'data-process'
dataset_path: './data/data-juicer/input/metadata.jsonl'  # path to your dataset directory or file
np: 4  # number of subprocess to process your dataset

text_keys: 'text'
image_key: 'image'
image_special_token: '<__dj__image>'

export_path: './data/data-juicer/output/result.jsonl'

# process schedule
# a list of several process operators with their arguments
process:
    - image_shape_filter:
        min_width: 1024
        min_height: 1024
        any_or_all: any
    - image_aspect_ratio_filter:
        min_ratio: 0.5
        max_ratio: 2.0
        any_or_all: any
"""
with open("data/data-juicer/data_juicer_config.yaml", "w") as file:
    file.write(data_juicer_config.strip())

Terminal Run

dj-process --config data/data-juicer/data_juicer_config.yaml

Datawhale AI夏令营第四期魔搭-AIGC文生图方向Task1笔记_AIGC_04

2024-08-09 01:03:47 | INFO     | data_juicer.config.config:618 - Back up the input config file [/root/k2/AIGC/kolors/data/data-juicer/data_juicer_config.yaml] into the work_dir [/root/k2/AIGC/kolors/data/data-juicer/output]
2024-08-09 01:03:47 | INFO     | data_juicer.config.config:640 - Configuration table: 
╒═════════════════════════╤═══════════════════════════════════════════════════════════════════════════════╕
│ key                     │ values                                                                        │
╞═════════════════════════╪═══════════════════════════════════════════════════════════════════════════════╡
│ config                  │ [Path_fr(data/data-juicer/data_juicer_config.yaml, cwd=/root/k2/AIGC/kolors)] │
├─────────────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ hpo_config              │ None                                                                          │
├─────────────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ data_probe_algo         │ 'uniform'                                                                     │
├─────────────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ data_probe_ratio        │ 1.0                                                                           │
├─────────────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ project_name            │ 'data-process'                                                                │
├─────────────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ executor_type           │ 'default'                                                                     │
├─────────────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ dataset_path            │ '/root/k2/AIGC/kolors/data/data-juicer/input/metadata.jsonl'                  │
├─────────────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ export_path             │ '/root/k2/AIGC/kolors/data/data-juicer/output/result.jsonl'                   │
├─────────────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ export_shard_size       │ 0                                                                             │
├─────────────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ export_in_parallel      │ False                                                                         │
├─────────────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ keep_stats_in_res_ds    │ False                                                                         │
├─────────────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ keep_hashes_in_res_ds   │ False                                                                         │
├─────────────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ np                      │ 4                                                                             │
├─────────────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ text_keys               │ 'text'                                                                        │
├─────────────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ image_key               │ 'image'                                                                       │
├─────────────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ image_special_token     │ '<__dj__image>'                                                               │
├─────────────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ audio_key               │ 'audios'                                                                      │
├─────────────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ audio_special_token     │ '<__dj__audio>'                                                               │
├─────────────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ video_key               │ 'videos'                                                                      │
├─────────────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ video_special_token     │ '<__dj__video>'                                                               │
├─────────────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ eoc_special_token       │ '<|__dj__eoc|>'                                                               │
├─────────────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ suffixes                │ []                                                                            │
├─────────────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ use_cache               │ True                                                                          │
├─────────────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ ds_cache_dir            │ '/root/.cache/huggingface/datasets'                                           │
├─────────────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ cache_compress          │ None                                                                          │
├─────────────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ use_checkpoint          │ False                                                                         │
├─────────────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ temp_dir                │ None                                                                          │
├─────────────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ open_tracer             │ False                                                                         │
├─────────────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ op_list_to_trace        │ []                                                                            │
├─────────────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ trace_num               │ 10                                                                            │
├─────────────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ op_fusion               │ False                                                                         │
├─────────────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ process                 │ [{'image_shape_filter': {'accelerator': None,                                 │
│                         │                          'any_or_all': 'any',                                 │
│                         │                          'audio_key': 'audios',                               │
│                         │                          'cpu_required': 1,                                   │
│                         │                          'image_key': 'image',                                │
│                         │                          'max_height': 9223372036854775807,                   │
│                         │                          'max_width': 9223372036854775807,                    │
│                         │                          'mem_required': 0,                                   │
│                         │                          'min_height': 1024,                                  │
│                         │                          'min_width': 1024,                                   │
│                         │                          'num_proc': 4,                                       │
│                         │                          'stats_export_path': None,                           │
│                         │                          'text_key': 'text',                                  │
│                         │                          'video_key': 'videos'}},                             │
│                         │  {'image_aspect_ratio_filter': {'accelerator': None,                          │
│                         │                                 'any_or_all': 'any',                          │
│                         │                                 'audio_key': 'audios',                        │
│                         │                                 'cpu_required': 1,                            │
│                         │                                 'image_key': 'image',                         │
│                         │                                 'max_ratio': 2.0,                             │
│                         │                                 'mem_required': 0,                            │
│                         │                                 'min_ratio': 0.5,                             │
│                         │                                 'num_proc': 4,                                │
│                         │                                 'stats_export_path': None,                    │
│                         │                                 'text_key': 'text',                           │
│                         │                                 'video_key': 'videos'}}]                      │
├─────────────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ percentiles             │ []                                                                            │
├─────────────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ export_original_dataset │ False                                                                         │
├─────────────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ save_stats_in_one_file  │ False                                                                         │
├─────────────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ ray_address             │ 'auto'                                                                        │
├─────────────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ debug                   │ False                                                                         │
├─────────────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ work_dir                │ '/root/k2/AIGC/kolors/data/data-juicer/output'                                │
├─────────────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ timestamp               │ '20240809010346'                                                              │
├─────────────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ dataset_dir             │ '/root/k2/AIGC/kolors/data/data-juicer/input'                                 │
├─────────────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ add_suffix              │ False                                                                         │
╘═════════════════════════╧═══════════════════════════════════════════════════════════════════════════════╛
2024-08-09 01:03:47 | INFO     | data_juicer.core.executor:47 - Using cache compression method: [None]
2024-08-09 01:03:47 | INFO     | data_juicer.core.executor:52 - Setting up data formatter...
2024-08-09 01:03:47 | INFO     | data_juicer.core.executor:74 - Preparing exporter...
2024-08-09 01:03:47 | INFO     | data_juicer.core.executor:151 - Loading dataset from data formatter...
Setting num_proc from 4 back to 1 for the jsonl split to disable multiprocessing as it only contains one shard.
Generating jsonl split: 1454 examples [00:00, 20076.04 examples/s]
2024-08-09 01:03:48 | INFO     | data_juicer.format.formatter:185 - Unifying the input dataset formats...
2024-08-09 01:03:48 | INFO     | data_juicer.format.formatter:200 - There are 1454 sample(s) in the original dataset.
Filter (num_proc=4): 100%|#############################################################################################################| 1454/1454 [00:00<00:00, 4993.64 examples/s]
2024-08-09 01:03:49 | INFO     | data_juicer.format.formatter:214 - 1454 samples left after filtering empty text.
2024-08-09 01:03:49 | INFO     | data_juicer.format.formatter:237 - Converting relative paths in the dataset to their absolute version. (Based on the directory of input dataset file)
Map (num_proc=4): 100%|################################################################################################################| 1454/1454 [00:00<00:00, 8079.14 examples/s]
2024-08-09 01:03:49 | INFO     | data_juicer.format.mixture_formatter:137 - sampled 1454 from 1454
2024-08-09 01:03:49 | INFO     | data_juicer.format.mixture_formatter:143 - There are 1454 in final dataset
2024-08-09 01:03:49 | INFO     | data_juicer.core.executor:157 - Preparing process operators...
2024-08-09 01:03:49 | INFO     | data_juicer.core.executor:164 - Processing data...
Adding new column for stats (num_proc=4): 100%|########################################################################################| 1454/1454 [00:00<00:00, 8337.08 examples/s]
image_shape_filter_compute_stats (num_proc=4): 100%|####################################################################################| 1454/1454 [00:08<00:00, 162.91 examples/s]
image_shape_filter_process (num_proc=4): 100%|#########################################################################################| 1454/1454 [00:00<00:00, 8087.59 examples/s]
2024-08-09 01:03:59 | INFO     | data_juicer.core.data:193 - OP [image_shape_filter] Done in 9.538s. Left 129 samples.
image_aspect_ratio_filter_compute_stats (num_proc=4): 100%|###############################################################################| 129/129 [00:01<00:00, 123.66 examples/s]
image_aspect_ratio_filter_process (num_proc=4): 100%|#####################################################################################| 129/129 [00:00<00:00, 799.52 examples/s]
2024-08-09 01:04:00 | INFO     | data_juicer.core.data:193 - OP [image_aspect_ratio_filter] Done in 1.372s. Left 129 samples.
2024-08-09 01:04:00 | INFO     | data_juicer.core.executor:171 - All OPs are done in 10.922s.
2024-08-09 01:04:00 | INFO     | data_juicer.core.executor:174 - Exporting dataset to disk...
2024-08-09 01:04:00 | INFO     | data_juicer.core.exporter:111 - Exporting computed stats into a single file...
Creating json from Arrow format: 100%|################################################################################################################| 1/1 [00:00<00:00, 10.24ba/s]
2024-08-09 01:04:00 | INFO     | data_juicer.core.exporter:140 - Export dataset into a single file...
Creating json from Arrow format: 100%|################################################################################################################| 1/1 [00:00<00:00, 18.43ba/s]

参考资料

  1. AIGC专题视频课程:https://space.bilibili.com/1069874770/channel/collectiondetail?sid=3369551
  2. AIGC专题品牌馆https://www.modelscope.cn/topic/cf0de97eb6284e16812d7c54fbe29fe7/pub/summary
  3. 模型训练入口:https://modelscope.cn/aigc/modelTraining