欢迎来到尧图网

客户服务 关于我们

您的位置:首页 > 财经 > 金融 > 论文阅读笔记:Deep Unsupervised Learning using Nonequilibrium Thermodynamics

论文阅读笔记:Deep Unsupervised Learning using Nonequilibrium Thermodynamics

2025/3/16 13:33:11 来源:https://blog.csdn.net/u010948546/article/details/146276321  浏览:    关键词:论文阅读笔记:Deep Unsupervised Learning using Nonequilibrium Thermodynamics

1、来源

论文连接1:http://ganguli-gang.stanford.edu/pdf/DeepUnsupDiffusion.pdf
论文连接2(带appendix):https://arxiv.org/pdf/1503.03585v7
代码链接:https://github.com/Sohl-Dickstein/Diffusion-Probabilistic-Models
代码的环境配置(基于theano)参考:https://blog.csdn.net/u010948546/article/details/146217516?spm=1001.2014.3001.5501

2、论文推理过程

扩散模型的流程如下图所示,可以看出 q ( x 0 , 1 , 2 ⋯ , T − 1 , T ) q(x^{0,1,2\cdots ,T-1, T}) q(x0,1,2,T1,T)为正向加噪音过程, p ( x 0 , 1 , 2 ⋯ , T − 1 , T ) p(x^{0,1,2\cdots ,T-1, T}) p(x0,1,2,T1,T)为逆向去噪音过程,具体过程参考https://blog.csdn.net/u010948546/article/details/144902864?spm=1001.2014.3001.5501。可以看出,逆向去噪的末端得到的图上还散布一些噪点。
请添加图片描述

2.1、名词解释

q ( x 0 ) q(x^0) q(x0) x 0 x^0 x0 表示数据集的图像分布,例如在使用MNIST数据集时, x 0 x^0 x0就表示MNIST数据集中的图像,而 q ( x 0 ) q(x^0) q(x0)就表示数据集MNIST中数据集的分布情况。
p ( x T ) p(x^T) p(xT) x T x^T xT表示 x 0 x^0 x0的加噪结果, x T x^T xT是逆向去噪的起点,因此 p ( x T ) p(x^T) p(xT)是去噪起点的分布情况。与 π ( x T ) \pi(x^T) π(xT)相同。
值得注意的是 p ( x t ) p(x^t) p(xt) q ( x t ) q(x^t) q(xt)是相同的。

2.2、推理过程

正向加噪过程满足马尔可夫性质,因此有公式1。

q ( x 0 , 1 , 2 ⋯ , T − 1 , T ) = q ( x 0 ) ⋅ ∏ t = 1 T q ( x t ∣ x t − 1 ) = q ( x 0 ) ⋅ q ( x 1 ∣ x 0 ) ⋅ q ( x 2 ∣ x 1 ) … q ( x T ∣ x T − 1 ) . q ( x 1 , 2 ⋯ T ∣ x 0 ) = q ( x 1 ∣ x 0 ) ⋅ q ( x 2 ∣ x 1 ) … q ( x T ∣ x T − 1 ) ) . \begin{equation} \begin{split} q(x^{0,1,2\cdots,T-1,T})&=q(x^0)\cdot \prod_{t=1}^{T}{q(x^t|x^{t-1})}=q(x^0)\cdot q(x^1|x^0)\cdot q(x^2|x^1)\dots q(x^T|x^{T-1}). \\ q(x^{1,2 \cdots T}|x^0)&=q(x^1|x^0)\cdot q(x^2|x^1)\dots q(x^T|x^{T-1})). \end{split} \end{equation} q(x0,1,2,T1,T)q(x1,2Tx0)=q(x0)t=1Tq(xtxt1)=q(x0)q(x1x0)q(x2x1)q(xTxT1).=q(x1x0)q(x2x1)q(xTxT1)).

逆向去噪过程如公式2。

p θ ( x 0 , 1 , 2 ⋯ , T − 1 , T ) = p θ ( x T ) ⋅ ∏ t = 1 T p θ ( x t − 1 ∣ x t ) = p θ ( x T ) ⋅ p θ ( x T − 1 ∣ x T ) ⋅ p θ ( x T − 2 ∣ x T − 1 ) … p θ ( x 0 ∣ x 1 ) . \begin{equation} p_{\theta}(x^{0,1,2\cdots,T-1,T})=p_{\theta}(x^T)\cdot \prod_{t=1}^{T}{p_{\theta}(x^{t-1}|x^{t})}=p_{\theta}(x^T)\cdot p_{\theta}(x^{T-1}|x^T)\cdot p_{\theta}(x^{T-2}|x^{T-1})\dots p_{\theta}(x^{0}|x^{1}). \end{equation} pθ(x0,1,2,T1,T)=pθ(xT)t=1Tpθ(xt1xt)=pθ(xT)pθ(xT1xT)pθ(xT2xT1)pθ(x0x1).
公式2中的参数 θ \theta θ就是深度学习模型中需要学习的参数。为了方便,省略公式2中的 θ \theta θ,因此公式2被重写为公式3。
p ( x 0 , 1 , 2 ⋯ , T − 1 , T ) = p ( x T ) ⋅ ∏ t = 1 T p ( x t − 1 ∣ x t ) = p ( x T ) ⋅ p ( x T − 1 ∣ x T ) ⋅ p ( x T − 2 ∣ x T − 1 ) … p ( x 0 ∣ x 1 ) . \begin{equation} p(x^{0,1,2\cdots,T-1,T})=p(x^T)\cdot \prod_{t=1}^{T}{p(x^{t-1}|x^{t})}=p(x^T)\cdot p(x^{T-1}|x^T)\cdot p(x^{T-2}|x^{T-1})\dots p(x^{0}|x^{1}). \end{equation} p(x0,1,2,T1,T)=p(xT)t=1Tp(xt1xt)=p(xT)p(xT1xT)p(xT2xT1)p(x0x1).

逆向去噪的目标是使得其终点与正向加噪的起点相同。也就是使得 p ( x 0 ) p(x^0) p(x0)最大,即使得 逆向去噪过程为 x 0 x^0 x0的概率最大。

p ( x 0 ) = ∫ p ( x 0 , x 1 ) d x 1 ( 联合分布概率公式 ) = ∫ p ( x 1 ) ⋅ p ( x 0 ∣ x 1 ) d x 1 ( 贝叶斯概率公式 ) = ∫ ∫ p ( x 1 , x 2 ) d x 2 ⋅ p ( x 0 ∣ x 1 ) d x 1 ( 积分套积分 ) = ∫ ∫ p ( x 2 ) ⋅ p ( x 1 ∣ x 2 ) ⋅ p ( x 0 ∣ x 1 ) d x 1 d x 2 ( 改写为二重积分 ) = ∫ ∫ p ( x 2 ) ⋅ p ( x 1 ∣ x 2 ) ⋅ p ( x 0 ∣ x 1 ) d x 1 d x 2 = ⋮ = ∫ ∫ ⋯ ∫ p ( x T ) ⋅ p ( x T − 1 ∣ x T ) ⋅ p ( x T − 2 ∣ x − 1 ) ⋯ p ( x 0 ∣ x 1 ) ⋅ d x 1 d x 2 ⋯ d x T = ∫ p ( x 0 , 1 , 2 ⋯ T ) d x 1 , 2 ⋯ T ( T − 1 重积分 ) = ∫ d x 1 , 2 ⋯ T ⋅ p ( x 0 , 1 , 2 ⋯ T ) ⋅ q ( x 1 , 2 ⋯ T ∣ x 0 ) q ( x 1 , 2 ⋯ T ∣ x 0 ) = ∫ d x 1 , 2 ⋯ T ⋅ q ( x 1 , 2 ⋯ T ∣ x 0 ) ⋅ p ( x 0 , 1 , 2 ⋯ T ) q ( x 1 , 2 ⋯ T ∣ x 0 ) = ∫ d x 1 , 2 ⋯ T ⋅ q ( x 1 , 2 ⋯ T ∣ x 0 ) ⋅ p ( x T ) ⋅ p ( x T − 1 ∣ x T ) ⋅ p ( x T − 2 ∣ x T − 1 ) … p ( x 0 ∣ x 1 ) q ( x 1 ∣ x 0 ) ⋅ q ( x 2 ∣ x 1 ) … q ( x T ∣ x T − 1 ) = ∫ d x 1 , 2 ⋯ T ⋅ q ( x 1 , 2 ⋯ T ∣ x 0 ) ⋅ p ( x T ) ⋅ p ( x T − 1 ∣ x T ) ⋅ p ( x T − 2 ∣ x T − 1 ) … p ( x 0 ∣ x 1 ) q ( x 1 ∣ x 0 ) ⋅ q ( x 2 ∣ x 1 ) … q ( x T ∣ x T − 1 ) = ∫ d x 1 , 2 ⋯ T ⋅ q ( x 1 , 2 ⋯ T ∣ x 0 ) ⋅ p ( x T ) ⋅ ∏ t = 1 T p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) = E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) p ( x T ) ⋅ ∏ t = 1 T p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ( 改写为期望的形式 ) \begin{equation} \begin{split} p(x^0)&=\int p(x^0,x^1)dx^{1} (联合分布概率公式)\\ &=\int p(x^1)\cdot p(x^0|x^1)dx^1 (贝叶斯概率公式) \\ &=\int \int p(x1,x2)dx^2 \cdot p(x^0|x^1)dx^1 (积分套积分)\\ &=\int \int p(x^2)\cdot p(x^1|x^2) \cdot p(x^0|x^1)dx^1 dx^2(改写为二重积分)\\ &= \int \int p(x^2) \cdot p(x^1|x^2) \cdot p(x^0|x^1) dx^1 dx^2 \\ &= \vdots \\ &= \int \int \cdots \int p(x^T)\cdot p(x^{T-1}|x^{T})\cdot p(x^{T-2}|x^{-1})\cdots p(x^0|x^1) \cdot dx^1 dx^2 \cdots dx^T \\ &= \int p(x^{0,1,2 \cdots T})dx^{1,2\cdots T} (T-1重积分) \\ &= \int dx^{1,2\cdots T} \cdot p(x^{0,1,2 \cdots T}) \cdot \frac{q(x^{1,2 \cdots T}| x^0)}{q(x^{1,2 \cdots T}|x^0)} \\ &= \int dx^{1,2\cdots T} \cdot q(x^{1,2 \cdots T}| x^0) \cdot \frac{ p(x^{0,1,2 \cdots T}) }{q(x^{1,2 \cdots T}|x^0)} \\ &= \int dx^{1,2\cdots T} \cdot q(x^{1,2 \cdots T}| x^0) \cdot \frac{ p(x^T)\cdot p(x^{T-1}|x^T)\cdot p(x^{T-2}|x^{T-1})\dots p(x^{0}|x^{1})}{q(x^1|x^0)\cdot q(x^2|x^1)\dots q(x^T|x^{T-1})} \\ &= \int dx^{1,2\cdots T} \cdot q(x^{1,2 \cdots T}| x^0) \cdot p(x^T)\cdot \frac{ p(x^{T-1}|x^T)\cdot p(x^{T-2}|x^{T-1})\dots p(x^{0}|x^{1})}{q(x^1|x^0)\cdot q(x^2|x^1)\dots q(x^T|x^{T-1})} \\ &= \int dx^{1,2\cdots T} \cdot q(x^{1,2 \cdots T}| x^0) \cdot p(x^T)\cdot \prod_{t=1}^{T} \frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})} \\ &= E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} p(x^T)\cdot \prod_{t=1}^{T} \frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})} (改写为期望的形式)\\ \end{split} \end{equation} p(x0)=p(x0,x1)dx1(联合分布概率公式)=p(x1)p(x0x1)dx1(贝叶斯概率公式)=∫∫p(x1,x2)dx2p(x0x1)dx1(积分套积分)=∫∫p(x2)p(x1x2)p(x0x1)dx1dx2(改写为二重积分)=∫∫p(x2)p(x1x2)p(x0x1)dx1dx2==∫∫p(xT)p(xT1xT)p(xT2x1)p(x0x1)dx1dx2dxT=p(x0,1,2T)dx1,2T(T1重积分)=dx1,2Tp(x0,1,2T)q(x1,2Tx0)q(x1,2Tx0)=dx1,2Tq(x1,2Tx0)q(x1,2Tx0)p(x0,1,2T)=dx1,2Tq(x1,2Tx0)q(x1x0)q(x2x1)q(xTxT1)p(xT)p(xT1xT)p(xT2xT1)p(x0x1)=dx1,2Tq(x1,2Tx0)p(xT)q(x1x0)q(x2x1)q(xTxT1)p(xT1xT)p(xT2xT1)p(x0x1)=dx1,2Tq(x1,2Tx0)p(xT)t=1Tq(xtxt1)p(xt1xt)=Ex1,2,Tq(x1,2Tx0)p(xT)t=1Tq(xtxt1)p(xt1xt)(改写为期望的形式)
因此公式3中的参数 θ \theta θ应满足
θ = a r g max θ p ( x 0 ) . \begin{equation} \theta= arg \underset {\theta}{\text{max}} p(x^0). \end{equation} θ=argθmaxp(x0).
公式4是对数据集中的一张图片进行求解,然而数据集中通常是有成千上万张图像的。假设数据集中有 N N N张图像,因此有公式6,其目的是求得一组参数 θ \theta θ,使得 L L L取得最大值。值得注意的是 q ( x 0 ) q(x^0) q(x0)表示数据集中每张图片被采样出来的概率。
L = ∑ n = 0 N q ( x 0 ) ⋅ l o g ( p ( x 0 ) ) = ∫ d x 0 ⋅ q ( x 0 ) ⋅ l o g ( p ( x 0 ) ) = ∫ d x 0 ⋅ q ( x 0 ) ⋅ l o g [ E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) p ( x T ) ⋅ ∏ t = 1 T p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ] ≥ ∫ d x 0 ⋅ q ( x 0 ) ⋅ E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) l o g [ p ( x T ) ⋅ ∏ t = 1 T p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ] = ∫ d x 0 ⋅ q ( x 0 ) ∫ q ( x 1 , 2 ⋯ T ∣ x 0 ) ⋅ l o g [ p ( x T ) ⋅ ∏ t = 1 T p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ] ⋅ d x 1 , 2 ⋯ T = ∫ d x 0 , 1 , 2 ⋯ T q ( x 0 ) ⋅ q ( x 1 , 2 ⋯ T ∣ x 0 ) ⋅ l o g [ p ( x T ) ⋅ ∏ t = 1 T p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ] = ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x T ) ⋅ ∏ t = 1 T p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ] = ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ ∏ t = 1 T p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ] + ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x T ) ] = K \begin{equation} \begin{split} L&=\sum_{n=0}^{N} q(x^0)\cdot log(p(x^0)) \\ &=\int dx^0\cdot q(x^0)\cdot log(p(x^0)) \\ &=\int dx^0\cdot q(x^0)\cdot log [ E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} p(x^T)\cdot \prod_{t=1}^{T} \frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})}] \\ & \geq \int dx^0\cdot q(x^0)\cdot E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} log [p(x^T)\cdot \prod_{t=1}^{T} \frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})}]\\ &= \int dx^0\cdot q(x^0) \int q(x^{1,2 \cdots T}| x^0) \cdot log [p(x^T)\cdot \prod_{t=1}^{T} \frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})}] \cdot dx^{1,2\cdots T}\\ &= \int dx^{0,1,2\cdots T} q(x^0) \cdot q(x^{1,2 \cdots T}| x^0) \cdot log [p(x^T)\cdot \prod_{t=1}^{T} \frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})}] \\ &= \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log [p(x^T)\cdot \prod_{t=1}^{T} \frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})}] \\ &= \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log [\prod_{t=1}^{T} \frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})}] + \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log [p(x^T)] \\ &= K \\ \end{split} \end{equation} L=n=0Nq(x0)log(p(x0))=dx0q(x0)log(p(x0))=dx0q(x0)log[Ex1,2,Tq(x1,2Tx0)p(xT)t=1Tq(xtxt1)p(xt1xt)]dx0q(x0)Ex1,2,Tq(x1,2Tx0)log[p(xT)t=1Tq(xtxt1)p(xt1xt)]=dx0q(x0)q(x1,2Tx0)log[p(xT)t=1Tq(xtxt1)p(xt1xt)]dx1,2T=dx0,1,2Tq(x0)q(x1,2Tx0)log[p(xT)t=1Tq(xtxt1)p(xt1xt)]=dx0,1,2Tq(x0,1,2T)log[p(xT)t=1Tq(xtxt1)p(xt1xt)]=dx0,1,2Tq(x0,1,2T)log[t=1Tq(xtxt1)p(xt1xt)]+dx0,1,2Tq(x0,1,2T)log[p(xT)]=K

因此有公式
K = ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ ∏ t = 1 T p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ] ⏟ K 1 + ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x T ) ] ⏟ K 2 = K 1 + K 2 \begin{equation} \begin{split} K &= \underbrace{\int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log [\prod_{t=1}^{T} \frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})}]}_{K1} + \underbrace{\int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log [p(x^T)]}_{K_2} \\ &=K_1 + K_2 \end{split} \end{equation} K=K1 dx0,1,2Tq(x0,1,2T)log[t=1Tq(xtxt1)p(xt1xt)]+K2 dx0,1,2Tq(x0,1,2T)log[p(xT)]=K1+K2

首先考虑 K K K中的第二项 K 2 K_2 K2
K 2 = ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x T ) ] = ∫ q ( x 0 ) ⋅ q ( x 1 ∣ x 0 ) ⋅ q ( x 2 ∣ x 1 ) ⋯ q ( x T ∣ x T − 1 ) ⋅ l o g [ p ( x T ) ] ⋅ d x 0 d x 1 ⋯ d x T = ∫ ( ∫ q ( x 1 , x 0 ) ⋅ d x 0 ) ⋅ q ( x 2 ∣ x 1 ) ⋯ q ( x T ∣ x T − 1 ) ⋅ l o g [ p ( x T ) ] ⋅ d x 1 ⋯ d x T = ∫ q ( x 1 ) ⋅ q ( x 2 ∣ x 1 ) ⋯ q ( x T ∣ x T − 1 ) ⋅ l o g [ p ( x T ) ] ⋅ d x 1 ⋯ d x T = ∫ q ( x T ) ⋅ l o g [ p ( x T ) ] ⋅ d x T = ∫ p ( x T ) ⋅ l o g [ p ( x T ) ] ⋅ d x T = − H p ( x T ) \begin{equation} \begin{split} K_2 &= \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log [p(x^T)] \\ &= \int q(x^0)\cdot q(x^1|x^0) \cdot q(x^2|x^1) \cdots q(x^{T}|x^{T-1})\cdot log [p(x^T)] \cdot dx^0 dx^1\cdots dx^{T} \\ &= \int \bigg( \int q(x^1, x^0) \cdot dx^0 \bigg) \cdot q(x^2|x^1) \cdots q(x^{T}|x^{T-1})\cdot log [p(x^T)] \cdot dx^1\cdots dx^{T} \\ &= \int q(x^1) \cdot q(x^2|x^1) \cdots q(x^{T}|x^{T-1})\cdot log [p(x^T)] \cdot dx^1\cdots dx^{T} \\ &= \int q(x^T) \cdot log [p(x^T)] \cdot dx^{T} \\ &= \int p(x^T) \cdot log [p(x^T)] \cdot dx^{T} \\ &=-H_p(x^T) \end{split} \end{equation} K2=dx0,1,2Tq(x0,1,2T)log[p(xT)]=q(x0)q(x1x0)q(x2x1)q(xTxT1)log[p(xT)]dx0dx1dxT=(q(x1,x0)dx0)q(x2x1)q(xTxT1)log[p(xT)]dx1dxT=q(x1)q(x2x1)q(xTxT1)log[p(xT)]dx1dxT=q(xT)log[p(xT)]dxT=p(xT)log[p(xT)]dxT=Hp(xT)
p ( x T ) p(x^T) p(xT)是一个均值为0,方差为1的高斯分布。参考【正态分布系列】正态分布的熵,可以计算出 K 2 K_2 K2如下所示。
K 2 = − H p ( x T ) = − ( 1 2 l o g [ 2 π σ 2 ] + 1 2 ) = − ( 1 2 l o g [ 2 π ] + 1 2 ) \begin{equation} \begin{split} K_2 &=-H_p(x^T) \\ &=-\bigg( \frac{1}{2} log[2 \pi \sigma^2] + \frac{1}{2} \bigg)\\ &=-\bigg( \frac{1}{2} log[2 \pi ] + \frac{1}{2} \bigg) \end{split} \end{equation} K2=Hp(xT)=(21log[2πσ2]+21)=(21log[2π]+21)
在代码中的计算过程如下图红框所示。
在这里插入图片描述
接下来考虑 K 1 K_1 K1。值得注意的是,论文中说明,为了避免边界效应,因此强迫 p ( x 0 ∣ x 1 ) = q ( x 1 ∣ x 0 ) p(x^{0}|x^1)=q(x^1|x^{0}) p(x0x1)=q(x1x0)

K 1 = ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ ∏ t = 1 T p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ] = ∑ t = 1 T ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ] = ∑ t = 2 T ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ] + ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x 0 ∣ x 1 ) q ( x 1 ∣ x 0 ) ] = ∑ t = 2 T ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ] = ∑ t = 2 T ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t ) ⋅ q ( x t − 1 ) q ( x t ) ] = ∑ t = 2 T ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) ⋅ q ( x t − 1 ∣ x 0 ) q ( x t ∣ x 0 ) ] = ∑ t = 2 T ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) ] + ∑ t = 2 T ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ q ( x t − 1 ∣ x 0 ) q ( x t ∣ x 0 ) ] = ∑ t = 2 T ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) ] + ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ ∏ t = 2 T q ( x t − 1 ∣ x 0 ) q ( x t ∣ x 0 ) ] = ∑ t = 2 T ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) ] + ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ q ( x 1 ∣ x 0 ) q ( x T ∣ x 0 ) ] = ∑ t = 2 T ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) ] + ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ q ( x 1 ∣ x 0 ) ] − ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ q ( x T ∣ x 0 ) ] = ∑ t = 2 T ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ p ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) ] + ∫ d x 0 d x 1 ⋯ d x T ⋅ q ( x 0 ) ⋅ q ( x 1 ∣ x 0 ) ⋅ q ( x 2 ∣ x 1 ) ⋯ q ( x T ∣ x T − 1 ) ⋅ l o g [ q ( x 1 ∣ x 0 ) ] − ∫ d x 0 , 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) ⋅ l o g [ q ( x T ∣ x 0 ) ] \begin{equation} \begin{split} K_1 &=\int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\prod_{t=1}^{T} \frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})}\bigg] \\ &= \sum_{t=1}^{T} \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})}\bigg] \\ &= \sum_{t=2}^{T} \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})}\bigg] + \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\frac{ p(x^{0}|x^1)}{q(x^1|x^{0})}\bigg] \\ &= \sum_{t=2}^{T} \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})}\bigg] \\ &= \sum_{t=2}^{T} \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\frac{ p(x^{t-1}|x^t)}{q(x^{t-1}|x^{t})}\cdot \frac{q(x^{t-1})}{q(x^t)}\bigg] \\ &= \sum_{t=2}^{T} \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\frac{ p(x^{t-1}|x^t)}{q(x^{t-1}|x^{t}, x^0)}\cdot \frac{q(x^{t-1}|x^0)}{q(x^t|x^0)}\bigg] \\ &= \sum_{t=2}^{T} \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\frac{ p(x^{t-1}|x^t)}{q(x^{t-1}|x^{t}, x^0)}\bigg] + \sum_{t=2}^{T} \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\frac{q(x^{t-1}|x^0)}{q(x^t|x^0)}\bigg]\\ &= \sum_{t=2}^{T} \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\frac{ p(x^{t-1}|x^t)}{q(x^{t-1}|x^{t}, x^0)}\bigg] + \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\prod_{t=2}^{T} \frac{q(x^{t-1}|x^0)}{q(x^t|x^0)}\bigg]\\ &= \sum_{t=2}^{T} \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\frac{ p(x^{t-1}|x^t)}{q(x^{t-1}|x^{t}, x^0)}\bigg] + \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\frac{q(x^{1}|x^0)}{q(x^T|x^0)}\bigg]\\ &= \sum_{t=2}^{T} \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\frac{ p(x^{t-1}|x^t)}{q(x^{t-1}|x^{t}, x^0)}\bigg] + \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[q(x^{1}|x^0)\bigg] - \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[q(x^T|x^0)\bigg]\\ &= \sum_{t=2}^{T} \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[\frac{ p(x^{t-1}|x^t)}{q(x^{t-1}|x^{t}, x^0)}\bigg] + \int dx^{0}dx^{1} \cdots dx^{T} \cdot q(x^{0}) \cdot q(x^1|x^0) \cdot q(x^2|x^1) \cdots q(x^T|x^{T-1}) \cdot log \bigg[q(x^{1}|x^0)\bigg] - \int dx^{0,1,2\cdots T} \cdot q(x^{0,1,2 \cdots T}) \cdot log \bigg[q(x^T|x^0)\bigg]\\ \end{split} \end{equation} K1=dx0,1,2Tq(x0,1,2T)log[t=1Tq(xtxt1)p(xt1xt)]=t=1Tdx0,1,2Tq(x0,1,2T)log[q(xtxt1)p(xt1xt)]=t=2Tdx0,1,2Tq(x0,1,2T)log[q(xtxt1)p(xt1xt)]+dx0,1,2Tq(x0,1,2T)log[q(x1x0)p(x0x1)]=t=2Tdx0,1,2Tq(x0,1,2T)log[q(xtxt1)p(xt1xt)]=t=2Tdx0,1,2Tq(x0,1,2T)log[q(xt1xt)p(xt1xt)q(xt)q(xt1)]=t=2Tdx0,1,2Tq(x0,1,2T)log[q(xt1xt,x0)p(xt1xt)q(xtx0)q(xt1x0)]=t=2Tdx0,1,2Tq(x0,1,2T)log[q(xt1xt,x0)p(xt1xt)]+t=2Tdx0,1,2Tq(x0,1,2T)log[q(xtx0)q(xt1x0)]=t=2Tdx0,1,2Tq(x0,1,2T)log[q(xt1xt,x0)p(xt1xt)]+dx0,1,2Tq(x0,1,2T)log[t=2Tq(xtx0)q(xt1x0)]=t=2Tdx0,1,2Tq(x0,1,2T)log[q(xt1xt,x0)p(xt1xt)]+dx0,1,2Tq(x0,1,2T)log[q(xTx0)q(x1x0)]=t=2Tdx0,1,2Tq(x0,1,2T)log[q(xt1xt,x0)p(xt1xt)]+dx0,1,2Tq(x0,1,2T)log[q(x1x0)]dx0,1,2Tq(x0,1,2T)log[q(xTx0)]=t=2Tdx0,1,2Tq(x0,1,2T)log[q(xt1xt,x0)p(xt1xt)]+dx0dx1dxTq(x0)q(x1x0)q(x2x1)q(xTxT1)log[q(x1x0)]dx0,1,2Tq(x0,1,2T)log[q(xTx0)]

版权声明:

本网仅为发布的内容提供存储空间,不对发表、转载的内容提供任何形式的保证。凡本网注明“来源:XXX网络”的作品,均转载自其它媒体,著作权归作者所有,商业转载请联系作者获得授权,非商业转载请注明出处。

我们尊重并感谢每一位作者,均已注明文章来源和作者。如因作品内容、版权或其它问题,请及时与我们联系,联系邮箱:809451989@qq.com,投稿邮箱:809451989@qq.com

热搜词