• 如果您觉得本站非常有看点，那么赶紧使用Ctrl+D 收藏吧

# 《神经网络的梯度推导与代码验证》之CNN前向和反向传播过程的代码验证

3周前 (09-04) 20次浏览

```import tensorflow as tf
import numpy as np
np.random.seed(0)```

```def get_crossentropy(y_pred, y_true):
return -tf.reduce_sum(y_true * tf.math.log(y_pred))```

``` 1 with tf.GradientTape(persistent=True) as t:
2     # -------input-----------
3     x = tf.constant(np.random.randn(1, 9, 9, 1).astype(np.float32))
4     y_true = np.array([[0.3, 0.5, 0.2]]).astype(np.float32)
5     t.watch(x)
6     # -----conv2d l1----------
7     l1 = tf.keras.layers.Conv2D(filters=1, kernel_size=(3, 3), strides=2)
8
9     z_l1 = l1(x)
10     t.watch([z_l1])
11     a_l1 = tf.nn.relu(z_l1)
12     t.watch([a_l1])
13     # -----max pooling--------
14     l2 = tf.keras.layers.MaxPool2D(pool_size=(2, 2))
15
16     z_l2 = l2(a_l1)
17     t.watch([z_l2])
18     a_l2 = tf.keras.layers.Flatten()(z_l2)
19     t.watch([a_l2])
20     # ---------FNN------------
21     l3 = tf.keras.layers.Dense(3)
22
23     z_l3 = l3(a_l2)
24     t.watch([z_l3])
25     a_l3 = tf.math.softmax(z_l3)
26     t.watch([a_l3])
27     # ---------loss----------
28     loss = get_crossentropy(y_pred=a_l3, y_true=y_true)```

input部分，输入x是形状为(1, 9, 9, 1)的张量，可以理解为一张单通道的9*9的图。标签y_true是3维的概率向量。在上面的代码中我们只考虑一条（x，y_true）样本。

——–前向传播验证———

```np.squeeze(l1.kernel.numpy())
Out[4]:
array([[-0.3447126 , -0.23770776, -0.20545131],
[ 0.40415084, -0.56749415,  0.13746339],
[-0.5106965 , -0.36734173, -0.18053415]], dtype=float32)```
```l1.bias.numpy()
Out[8]: array([0.], dtype=float32)```

```np.squeeze(x)
Out[6]:
array([[ 1.7640524 ,  0.4001572 ,  0.978738  ,  2.2408931 ,  1.867558  ,
-0.9772779 ,  0.95008844, -0.1513572 , -0.10321885],
[ 0.41059852,  0.14404356,  1.4542735 ,  0.7610377 ,  0.12167501,
0.44386324,  0.33367434,  1.4940791 , -0.20515826],
[ 0.3130677 , -0.85409576, -2.5529897 ,  0.6536186 ,  0.8644362 ,
-0.742165  ,  2.2697546 , -1.4543657 ,  0.04575852],
[-0.18718386,  1.5327792 ,  1.4693588 ,  0.15494743,  0.37816253,
-0.88778573, -1.9807965 , -0.34791216,  0.15634897],
[ 1.2302907 ,  1.2023798 , -0.3873268 , -0.30230275, -1.048553  ,
-1.420018  , -1.7062702 ,  1.9507754 , -0.5096522 ],
[-0.4380743 , -1.2527953 ,  0.7774904 , -1.6138978 , -0.21274029,
-0.89546657,  0.3869025 , -0.51080513, -1.1806322 ],
[-0.02818223,  0.42833188,  0.06651722,  0.3024719 , -0.6343221 ,
-0.36274117, -0.67246044, -0.35955316, -0.8131463 ],
[-1.7262826 ,  0.17742614, -0.40178093, -1.6301984 ,  0.46278226,
-0.9072984 ,  0.0519454 ,  0.7290906 ,  0.12898292],
[ 1.1394007 , -1.2348258 ,  0.40234163, -0.6848101 , -0.87079716,
-0.5788497 , -0.31155252,  0.05616534, -1.1651498 ]],
dtype=float32)```

```np.squeeze(z_l1)
Out[7]:
array([[-0.00542112, -0.17352474, -1.3421125 , -1.6447177 ],
[-1.1239526 ,  1.6031268 ,  1.1616374 , -0.78091574],
[-0.14451274,  1.5910958 ,  2.1035302 ,  1.1354219 ],
[-1.1602874 ,  1.0651501 ,  1.8656987 ,  0.4581319 ]],
dtype=float32)```

```np.sum(np.squeeze(x)[0:3, 0:3] * np.squeeze(l1.kernel.numpy()))
Out[8]: -0.0054210722```
```np.sum(np.squeeze(x)[2:5, 0:3] * np.squeeze(l1.kernel.numpy()))
Out[9]: -1.1239524```
```np.sum(np.squeeze(x)[0:3, 2:5] * np.squeeze(l1.kernel.numpy()))
Out[10]: -0.17352472```

```np.squeeze(a_l1)
Out[11]:
array([[0.       , 0.       , 0.       , 0.       ],
[0.       , 1.6031268, 1.1616374, 0.       ],
[0.       , 1.5910958, 2.1035302, 1.1354219],
[0.       , 1.0651501, 1.8656987, 0.4581319]], dtype=float32)
np.squeeze(z_l2)
Out[12]:
array([[1.6031268, 1.1616374],
[1.5910958, 2.1035302]], dtype=float32)```

———-反向梯度计算的验证———-

```# -----dl_dz3------
my_dl_dz3 = a_l3 - y_true```
```dl_dz3
Out[13]: <tf.Tensor: shape=(1, 3), dtype=float32, numpy=array([[ 0.20361423, -0.4315467 ,  0.22793245]], dtype=float32)>
my_dl_dz3
Out[14]: <tf.Tensor: shape=(1, 3), dtype=float32, numpy=array([[ 0.20361423, -0.4315467 ,  0.22793244]], dtype=float32)>```

\$frac{partial l}{partialboldsymbol{z}_ l3}\$应当满足：

\$frac{partial l}{partialboldsymbol{z}_ l3} = boldsymbol{a}_ l3 – boldsymbol{y}_{true}\$

（跳过FNN的反向梯度验证意味着此时我们是已经能够计算得到\$frac{partial l}{partialboldsymbol{z}_ l2}\$）

```inverse_mp = np.squeeze(t.gradient(loss, z_l1))

inverse_mp
Out[15]:
array([[ 0.        ,  0.        ,  0.        ,  0.        ],
[ 0.        ,  0.08187968, -0.07518297,  0.        ],
[ 0.        , -0.2259186 ,  0.5417712 ,  0.        ],
[ 0.        ,  0.        ,  0.        ,  0.        ]],
dtype=float32)```

\$boldsymbol{delta}_{k}^{l – 1} = upsampleleft( boldsymbol{delta}_{k}^{l} right) odot sigma^{‘}left( boldsymbol{z}_{k}^{l – 1} right)\$

```flat_z_l1 = np.squeeze(z_l1)

flat_z_l1
Out[16]:
array([[-0.00542112, -0.17352474, -1.3421125 , -1.6447177 ],
[-1.1239526 ,  1.6031268 ,  1.1616374 , -0.78091574],
[-0.14451274,  1.5910958 ,  2.1035302 ,  1.1354219 ],
[-1.1602874 ,  1.0651501 ,  1.8656987 ,  0.4581319 ]],
dtype=float32)```

```# z_l1经过激活函数relu之后变成了下面的a_l1
flat_a_l1 = np.squeeze(a_l1)

flat_a_l1
Out[17]:
array([[0.       , 0.       , 0.       , 0.       ],
[0.       , 1.6031268, 1.1616374, 0.       ],
[0.       , 1.5910958, 2.1035302, 1.1354219],
[0.       , 1.0651501, 1.8656987, 0.4581319]], dtype=float32)```

```# a_l1池化后得到下面的结果，我们记住池化前后的元素位置用于后面up sampling
flat_z_l2 = np.squeeze(z_l2)

flat_z_l2
Out[18]:
array([[1.6031268, 1.1616374],
[1.5910958, 2.1035302]], dtype=float32)```

```# 下面是dl_dz2的结果

dl_dz2
Out[23]:
array([[ 0.08187968, -0.07518297],
[-0.2259186 ,  0.5417712 ]], dtype=float32)```

```dl_dz1 = t.gradient(loss, z_l1)

np.squeeze(dl_dz1)
Out[26]:
array([[ 0.        ,  0.        ,  0.        ,  0.        ],
[ 0.        ,  0.08187968, -0.07518297,  0.        ],
[ 0.        , -0.2259186 ,  0.5417712 ,  0.        ],
[ 0.        ,  0.        ,  0.        ,  0.        ]],
dtype=float32)```

\$frac{partial l}{partialboldsymbol{x}} = frac{partial l}{partialboldsymbol{z}_ l1}*rot180left( boldsymbol{W}^{1} right)\$

```# 已知dl_dz1

```# -----dl_dx--------

np.squeeze(dl_dx)
Out[5]:
array([[ 0.        ,  0.        ,  0.        ,  0.        , -0.02591492,
0.0461329 , -0.02298165,  0.        ,  0.        ],
[ 0.        ,  0.        ,  0.        ,  0.        ,  0.00783426,
-0.0234191 ,  0.00530995,  0.        ,  0.        ],
[ 0.        ,  0.        ,  0.        ,  0.        ,  0.05378128,
-0.02789585,  0.07432017,  0.        ,  0.        ],
[ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
0.        ,  0.        ,  0.        ,  0.        ],
[ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
0.        ,  0.02716039, -0.04835004,  0.02408614],
[ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
0.        , -0.00821077,  0.02454463, -0.00556515],
[-0.02953471,  0.05257672, -0.02619171,  0.        ,  0.        ,
0.        , -0.05636601,  0.02923652, -0.07789201],
[ 0.00892854, -0.02669027,  0.00605165,  0.        ,  0.        ,
0.        ,  0.        ,  0.        ,  0.        ],
[ 0.06129343, -0.03179233,  0.08470119,  0.        ,  0.        ,
0.        ,  0.        ,  0.        ,  0.        ]],
dtype=float32)```

• 前向计算有：

\$conv2Dleft( {boldsymbol{a}^{l – 1},boldsymbol{W}^{l},^{‘}valid^{‘},~stride = 2} right) = leftlbrack begin{array}{lll} z_{11} & 0 & z_{12} \ 0 & 0 & 0 \ z_{21} & 0 & z_{22} \ end{array} rightrbrackoverset{down~sampling}{rightarrow}boldsymbol{z}^{l} = leftlbrack begin{array}{ll} z_{11} & z_{12} \ z_{21} & z_{22} \ end{array} rightrbrack\$

• 计算反向梯度时有：

loss \$l\$对矩阵\$boldsymbol{z}^{l}\$的导数，即\$boldsymbol{delta}^{l}\$，它经过上采用后再跟\$rot180left( boldsymbol{W}^{l} right)\$进行stride=1的full模式的卷积运算的结果就是上面例子2的最终结果，即：

\$boldsymbol{delta}^{l} = leftlbrack begin{array}{ll} delta_{11} & delta_{12} \ delta_{21} & delta_{22} \ end{array} rightrbrackoverset{up~sampling}{rightarrow}leftlbrack begin{array}{lll} delta_{11} & 0 & delta_{12} \ 0 & 0 & 0 \ delta_{21} & 0 & delta_{22} \ end{array} rightrbrack\$

\$frac{partial l}{partialboldsymbol{a}^{l – 1}} = conv2Dleft( {leftlbrack begin{array}{lll} delta_{11} & 0 & delta_{12} \ 0 & 0 & 0 \ delta_{21} & 0 & delta_{22} \ end{array} rightrbrack,rot180left( boldsymbol{W}^{l} right),’full’} right)\$

```inverse_conv2d = tf.keras.layers.Conv2DTranspose(filters=1, kernel_size=(3, 3), strides=2)
inverse_conv2d.build(dl_dz1.shape)```

```inverse_conv2d.kernel = l1.kernel
inverse_conv2d.bias = l1.bias```

```my_dl_dx = np.squeeze(inverse_conv2d(dl_dz1))

my_dl_dx
Out[6]:
array([[ 0.        ,  0.        ,  0.        ,  0.        , -0.02591492,
0.0461329 , -0.02298165,  0.        ,  0.        ],
[ 0.        ,  0.        ,  0.        ,  0.        ,  0.00783426,
-0.0234191 ,  0.00530995,  0.        ,  0.        ],
[ 0.        ,  0.        ,  0.        ,  0.        ,  0.05378128,
-0.02789585,  0.07432017,  0.        ,  0.        ],
[ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
0.        ,  0.        ,  0.        ,  0.        ],
[ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
0.        ,  0.02716039, -0.04835004,  0.02408614],
[ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
0.        , -0.00821077,  0.02454463, -0.00556515],
[-0.02953471,  0.05257672, -0.02619171,  0.        ,  0.        ,
0.        , -0.05636601,  0.02923652, -0.07789201],
[ 0.00892854, -0.02669027,  0.00605165,  0.        ,  0.        ,
0.        ,  0.        ,  0.        ,  0.        ],
[ 0.06129343, -0.03179233,  0.08470119,  0.        ,  0.        ,
0.        ,  0.        ,  0.        ,  0.        ]],
dtype=float32)```

\$frac{partial l}{partialboldsymbol{W}^{l}} = {sumlimits_{k}{sumlimits_{l}delta_{k,l}^{l}}}sigmaleft( z_{k + x,l + y}^{l – 1} right) = conv2Dleft( {sigmaleft( boldsymbol{z}^{l – 1} right),boldsymbol{delta}^{l}~,’valid’} right)\$

```np.squeeze(dl_dz1)
Out[13]:
array([[ 0.        ,  0.        ,  0.13701844,  0.        ],
[ 0.        ,  0.        ,  0.        ,  0.        ],
[ 0.        ,  0.        ,  0.        , -0.14360355],
[ 0.15615712,  0.        ,  0.        ,  0.        ]],
dtype=float32)```

```new_kernel = np.squeeze(dl_dz1)
col = np.zeros(4)
new_kernel = np.column_stack((new_kernel[:, 0], col, new_kernel[:, 1], col, new_kernel[:, 2], col, new_kernel[:, 3]))
row = np.zeros(7)
new_kernel = np.row_stack((new_kernel[0, :], row, new_kernel[1, :], row, new_kernel[2, :], row, new_kernel[3, :]))

np.squeeze(new_kernel)
Out[15]:
array([[ 0.        ,  0.        ,  0.        ,  0.        ,  0.13701844,
0.        ,  0.        ],
[ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
0.        ,  0.        ],
[ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
0.        ,  0.        ],
[ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
0.        ,  0.        ],
[ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
0.        , -0.14360355],
[ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
0.        ,  0.        ],
[ 0.15615712,  0.        ,  0.        ,  0.        ,  0.        ,
0.        ,  0.        ]], dtype=float32)```

```conv2d = tf.keras.layers.Conv2D(filters=1, kernel_size=new_kernel.shape)
conv2d.build(inverse_conv2d(dl_dz1).shape)

new_kernel = new_kernel[:, :, np.newaxis, np.newaxis].astype(np.float32)
conv2d.kernel = new_kernel```

```dl_dW = np.squeeze(t.gradient(loss, l1.kernel))

dl_dW
Out[3]:
array([[-0.10748424, -0.00609609,  0.364539  ],
[ 0.1810095 , -0.13556898, -0.19335817],
[-0.135235  , -0.18566257, -0.23072638]], dtype=float32)```

```my_dl_dW = np.squeeze(conv2d(x))

my_dl_dW
Out[4]:
array([[-0.10748424, -0.00609609,  0.364539  ],
[ 0.1810095 , -0.13556898, -0.19335817],
[-0.135235  , -0.18566257, -0.23072638]], dtype=float32)```

```dl_db = np.squeeze(t.gradient(loss, l1.bias))
dl_db
Out[5]: array(0.15956555, dtype=float32)

my_dl_db = np.sum(new_kernel.astype(np.float32))
my_dl_db
Out[6]: 0.15956555```

（欢迎转载，转载请注明出处。欢迎留言或沟通交流： lxwalyw@gmail.com