【diffusers极速入门（七）】Classifier-Free Guidance （CFG）直观理解以及对应代码

系列文章目录

【diffusers 极速入门（一）】pipeline 实际调用的是什么？ call 方法!
【diffusers 极速入门（二）】如何得到扩散去噪的中间结果？Pipeline callbacks 管道回调函数
【diffusers极速入门（三）】生成的图像尺寸与 UNet 和 VAE 之间的关系
【diffusers极速入门（四）】EMA 操作是什么？
【diffusers极速入门（五）】扩散模型中的 Scheduler（noise_scheduler）的作用是什么？
【diffusers极速入门（六）】缓存梯度和自动放缩学习率以及代码详解

提示：写完文章后，目录可以自动生成，如何生成可参考右边的帮助文档

文章目录

系列文章目录
前言
一、Classifier-Free Guidance (CFG) 的做法和作用
二、对应 diffusers 代码
- - - - 第一处代码（`__call__`函数中）
      - 第二处代码（`encode_prompt` 函数中）
参考文献

前言

由于 Classifier-Free Guidance (CFG) 相关的理论解释博客已经很多了，本文不涉及理论推导，而侧重直观理解和对应的 diffusers 代码。

一、Classifier-Free Guidance (CFG) 的做法和作用

在生成模型（如扩散模型）中，Classifier-Free Guidance 是一种在不依赖显式分类器的情况下提升生成结果质量的技术。传统上，扩散模型会在噪声和目标分布之间逐步转换，但为了让生成结果更符合特定的条件（如文本描述），引入了 guidance 方法。

做法：

无条件生成：模型首先生成一个“无条件”（unconditioned）的预测，即在没有任何文本提示或条件的情况下的生成。
有条件生成：模型再生成一个“有条件”的预测，即在给定文本提示（prompt）的情况下的生成。
合成结果：最终的生成结果通过将无条件和有条件的预测组合来实现：

$\text{最终生成} = \text{无条件预测} + w \times (\text{有条件预测} - \text{无条件预测})$

其中，( w ) 是 guidance_scale 参数，用于控制生成结果与文本提示的相关性。这个公式的目标是在保持生成结果自然性的前提下，使其更贴合给定的条件（如文本描述）。

作用：

guidance_scale（即公式中的 ( w )）越大，生成的图像越贴近文本提示，但这可能会导致图像质量的下降或不自然的细节。
guidance_scale 值越小，图像则越自然（真实），但可能与文本提示的相关性较低。

二、对应 diffusers 代码

以 /path/to/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py 的代码为例， CFG 主要相关的参数是 guidance_scale 和 do_classifier_free_guidance 。

guidance_scale：控制 CFG 的强度，影响生成图像与文本提示的相关性。通过设置 guidance_scale>1 启用 CFG。 guidance_scale 越高、生成的图像与文本 “提示” 的相关性越高，但通常图像质量会有所下降。
do_classifier_free_guidance：布尔参数，用于启用或禁用 CFG。当启用时，模型会根据上面提到的公式进行预测。

第一处代码（`call`函数中）

代码中 noise_pred.chunk(2) 这一行将模型的预测结果一分为二，其中 noise_pred_uncond 是无条件预测，noise_pred_text 是有条件预测。负向提示嵌入的处理使得在使用 CFG 时，模型能生成更符合用户要求的结果。

  # perform guidanceif self.do_classifier_free_guidance:noise_pred_uncond, noise_pred_text = noise_pred.chunk(2)noise_pred = noise_pred_uncond + self.guidance_scale * (noise_pred_text - noise_pred_uncond)

第二处代码（`encode_prompt` 函数中）

do_classifier_free_guidance 与 negative_prompt_embeds：

当 do_classifier_free_guidance 为 True 时，且 “负向提示”（negative prompt）的嵌入（negative_prompt_embeds）为 None，才会执行对 negative prompt 的处理。

... 
if do_classifier_free_guidance and negative_prompt_embeds is None:negative_prompt = negative_prompt or ""negative_prompt_2 = negative_prompt_2 or negative_promptnegative_prompt_3 = negative_prompt_3 or negative_prompt# normalize str to listnegative_prompt = batch_size * [negative_prompt] if isinstance(negative_prompt, str) else negative_promptnegative_prompt_2 = (batch_size * [negative_prompt_2] if isinstance(negative_prompt_2, str) else negative_prompt_2)negative_prompt_3 = (batch_size * [negative_prompt_3] if isinstance(negative_prompt_3, str) else negative_prompt_3)
...

参考文献

guidance_scale (float, optional, defaults to 5.0):
Guidance scale as defined in Classifier-Free Diffusion Guidance.
guidance_scale is defined as w of equation 2. of Imagen Paper.

【diffusers极速入门（七）】Classifier-Free Guidance （CFG）直观理解以及对应代码

系列文章目录

文章目录

前言

一、Classifier-Free Guidance (CFG) 的做法和作用

二、对应 diffusers 代码

第一处代码（`call`函数中）

第二处代码（`encode_prompt` 函数中）

参考文献

相关资讯

热文排行

最新新闻

推荐新闻

热搜词

【diffusers极速入门（七）】Classifier-Free Guidance （CFG）直观理解以及对应代码

系列文章目录

文章目录

前言

一、Classifier-Free Guidance (CFG) 的做法和作用

二、对应 diffusers 代码

第一处代码（__call__函数中）

第二处代码（encode_prompt 函数中）

参考文献

相关资讯

热文排行

最新新闻

推荐新闻

热搜词

第一处代码（`call`函数中）

第二处代码（`encode_prompt` 函数中）