Backpropagation

1 Forward Propagation

This is more like a summary section.

We set $a^{[0]} = x$ for our input to the network and $\ell = 1,2,\dots,N$ where N is the number of layers of network. Then, we have

$z^{[\ell]} = W^{[\ell]}a^{[\ell-1]} + b^{[\ell]}$ $a^{[\ell]} = g^{[\ell]}(z^{[\ell]})$

where $g^{[\ell]}$ is the same for all the layers except for last layer. For the last layer, we can do:

1 regression then $g(x) = x$

2 binary then $g(x) = sigmoid(x)$

3 multi-class then $g(x) = softmax(x)$

Finally, we can have the output of the network $a^{[N]}$ and compute its loss.

For regression, we have:

$\mathcal{L}(\hat{y},y) = \frac{1}{2}(\hat{y} - y)^2$

For binary classification, we have:

$\mathcal{L}(\hat{y},y) = -\bigg(y\log\hat{y} + (1-y)\log (1-\hat{y})\bigg)$

For multi-classification, we have:

$\mathcal{L}(\hat{y},y) = -\sum\limits_{j=1}^k\mathbb{1}\{y=j\}\log\hat{y}_j$

Note that for multi-class, if we have $\hat{y}$ as a k-dimensional vector, we can calculate its cross-entropy for its loss:

$\mathcal{L}(\hat{y},y) = -\sum\limits_{j=1}^ky_j\log\hat{y}_j$

2 Backpropagation

We define that:

$\delta^{[\ell]} = \triangledown_{z^{[\ell]}}\mathcal{L}(\hat{y},y)$

So we have three steps for computing the gradient for any layer:

1 For output layer N, we have:

$\delta^{[N]} = \triangledown_{z^{[N]}}\mathcal{L}(\hat{y},y)$

For softmax function, since it is not performed element-wise, so you can directly caculate it as a whole. For sigmoid, it is applied element-wise, so we need to:

$\triangledown_{z^{[N]}}\mathcal{L}(\hat{y},y) = \triangledown_{\hat{y}}\mathcal{L}(\hat{y},y)\circ (g^{[N]})^{\prime}(z^{[N]})$

Note this is element-wise operation.

2 For $\ell = N-1,N-2,\dots,1$, we have:

$\delta^{[\ell]} = (W^{[\ell+1]T}\delta^{[\ell+1]})\circ g^{\prime}(z^{[\ell]})$

3 For each layer, we have:

$\triangle_{W^{[\ell]}}J(W,b) = \delta^{[\ell]}a^{[\ell]T}$ $\triangle_{b^{[\ell]}}J(W,b) = \delta^{[\ell]}$

This can be directly used in coding, which acts like a formula.

Share on

Twitter Facebook Google+ QR Code