在 Scnet 上微调 Stable Diffusion 3 模型-CSDN.NET

CSDN首页> 业界

订阅业界RSS

在 Scnet 上微调 Stable Diffusion 3 模型

发表于 2024-11-07 18:07:47

1 AI 算力反馈
1.1 运行的商品名称
我运行的商品为 Stable Diffusion 3 文本到图像高质量生成AI绘画推理服务，运行的环境为异构加速卡AI 64G。具体创建流程如下：

1.1.1 购买模型服务
首先购买一下模型服务，这样我们就不需要再从 Hugging Face 下载预训练模型了

1.1.2 选择合适的开发机器

点击模型开发并配置开发环境（注意开发环境需要同步），这里选择的是异构计算单卡 64G + pytorch dtk24.04.1 的开发环境。

1.1.3 打开开发环境

等待 Notebook 环境准备完成后点击 JupyterLab 来打开开发环境

1.1.4 创建一个新的项目文件

创建并新建一个 ipynb 文件（下面的步骤均在该环境下开发），注意，这里的Python版本需要是3.10。

1.2 运行的过程记录

在开始项目之前，我们先定义项目的工作目录。

work_path="/root/private_data/apprepo/model/20240729095814/stable-diffusion-3-medium-diffusers-2407251517"

1.2.1 安装 diffusers
Hugging Face 的 diffusers 仓库是一个用于处理和应用扩散模型的开源库，在这次的任务中，我们使用 diffusers 仓库来进行微调，在开始微调之前，我们需要安装 diffusers。执行以下代码来保证不会因为 Github Repo 的体积太大导致无法正常的下
载。

# 保证 Github 仓库在克隆时不会出现太大而无法下载的错误
!git config --global http.sslVerify "false"
!git config --global http.postBuffer 1048576000
!git config --global core.compression -1
!git config --global http.lowSpeedLimit 0 
!git config --global http.lowSpeedTime 999999

执行以下代码来保证 git 工具把你的凭证保存到本地，方便二次开发。

# 配置 git 保存你的凭证
!git config --global credential.helper store

执行以下代码来下载并安装 diffusers 仓库，需要注意的是，由于网络原因，服务器在下载 diffusers 仓库时可能会出现问题，建议只下载深度为1的仓库。

%cd $work_path
# 克隆 Hugging Face 的 diffusers 仓库
!rm -rf diffusers
!git clone https://github.com/huggingface/diffusers --depth 1
# 进入 diffusers 目录
%cd diffusers
# 以可编辑模式安装 diffusers 库
!pip install -e .
# 配置 Accelerate 库的默认设置
!accelerate config default

1.2.2 安装微调所需要的依赖
如果你的目标是微调模型，你可能还需要安装微调所需要的依赖项。需要注意的是，当我们使用的是国产推理卡，运行 runtime 为 tk框架，需要安装特定版本的 torch。但是在安装 torchvision 这个包时，系统会自动安装 cuda 版本的torch，因此我们需要屏蔽掉这个安装包再执行安装命令。

%cd $work_path/diffusers/examples/dreambooth
# 安装 requirements_sd3.txt 中的依赖
!sed -i 's/torchvision/#torchvision/' requirements_sd3.txt
!pip install -r requirements_sd3.txt -i https://pypi.mirrors.ustc.edu.cn/simple/

接着手动安装 torchvision （以不安装依赖的方式），我这里的 torch 版本为 2.1.0，对应的torchvision 版本为 0.16，更详细的版本对应信息可以参考下表或参考
torchvision Repo。

!pip install torchvision==0.16 -i https://pypi.mirrors.ustc.edu.cn/simple/ --no-deps

1.2.3 微调你的 Stable Diffusion 模型

这里使用 Python 的 heredoc 语法执行 Python 代码块，利用 Hugging Face 下载所需要的数据集。

%cd $work_path/diffusers/examples/dreambooth

# 下载所需要的数据集。
from huggingface_hub import snapshot_download
# 设置数据集安装路径
local_dir = "./dataset/dog"
# 从 Hugging Face Hub 下载 "diffusers/dog-example" 数据集到本地目录
snapshot_download(
    "diffusers/dog-example",
    local_dir=local_dir,
    repo_type="dataset",
    ignore_patterns=".gitattributes",
)

# 这里必须删除掉没用的 .huggingface 目录
!rm -rf ./dataset/dog/.huggingface

如果你希望使用 Hugging Face 官方的预训练模型，请前往 Stable Diffusion 3 on Hugging Face 并登陆你的 Hugging Face ID 并签署协议，保证你不会用于商用。在签署完协议后，你需要手动在终端执行以下代码:

# 配置 token（输入以下命令 --> 输入你的Token --> 回车 --> 输入Y --> 回车）
huggingface-cli login

如果你使用的是 SCNet 提供的预训练模型，你不需要做任何操作，但也请确保你没有用于商用。

# 进入 diffusers/examples/dreambooth 目录
%cd $work_path/diffusers/examples/dreambooth

# 使用 Accelerate 启动 train_dreambooth_lora_sd3.py 训练脚本
!accelerate launch train_dreambooth_lora_sd3.py \
  --pretrained_model_name_or_path=$work_path/stabilityai/stable-diffusion-3-medium-diffusers  \
  --instance_data_dir="./dataset/dog" \
  --output_dir="trained-sd3-lora" \
  --mixed_precision="fp16" \
  --instance_prompt="a photo of sks dog" \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --learning_rate=1e-5 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=500 \
  --validation_prompt="A photo of sks dog in a bucket" \
  --validation_epochs=25 \
  --seed="0"

# 注意，如果你使用的是官方的预训练模型且没有成功签署协议，在训练代码时你可能会出现如下错误:
# OSError: Can't load tokenizer for 'stabilityai/stable-diffusion-3-medium'. 
# If you were trying to load it from 'https://huggingface.co/models', 
# make sure you don't have a local directory with the same name. 
# Otherwise, make sure 'stabilityai/stable-diffusion-3-medium' is the correct 
# path to a directory containing all relevant files for a CLIPTokenizer tokenizer.

训练结束后可能会报错 expected scalar type Float but found Half 这是 Hugging Face 的Bug，可以忽略它，我们的训练是没有问题的。

1.2.4 使用微调后的模型执行推理

%cd $work_path/diffusers/examples/dreambooth/trained-sd3-lora

from diffusers import StableDiffusion3Pipeline
import torch

model_path = "./checkpoint-500"
pipe = StableDiffusion3Pipeline.from_pretrained(f"{work_path}/stabilityai/stable-diffusion-3-medium-diffusers", torch_dtype=torch.float16)
pipe.load_lora_weights(model_path)
pipe.to("cuda")

prompt = "A photo of sks dog in a bucket."
with torch.autocast("cuda"):
    image = pipe(prompt).images[0]
    image.save("output.png")

推理后的输出结果如下，可以看到生成的图片质量还是可以的