Tutorial 96 - Deep Learning terminology explained - Back propagation and optimizers
Code associated with these tutorials can be downloaded from here:
The essence of deep learning is to find best weights (and biases) for the network that minimizes error (loss).
This is done via an iterative process where weights are updated in each iteration in a direction that minimizes the loss.
To find this direction we need the slope of the loss function for a given weight. This is achieved by computing the derivative (gradient).
It is computationally expensive to compute derivatives for millions of weights.
Backpropagation makes this computation possible by using the chain rule in calculus finding the derivative of loss for every weight in the network.
Gradient descent is a general term for calculating the gradient and updating the weights.
__________________________________________
In Gradient Decent (GD) optimization, the weights are updated incrementally after each epoch (= after it sees the entire training dataset).
For large