Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions (Paper Explained)
#imle #backpropagation #discrete
Backpropagation is the workhorse of deep learning, but unfortunately, it only works for continuous functions that are amenable to the chain rule of differentiation. Since discrete algorithms have no continuous derivative, deep networks with such algorithms as part of them cannot be effectively trained using backpropagation. This paper presents a method to incorporate a large class of algorithms, formulated as discrete exponential family distributions, into deep networks and derives gradient estimates that can easily be used in end-to-end backpropagation. This enables things like combinatorial optimizers to be part of a network’s forward propagation natively.
OUTLINE:
0:00 - Intro & Overview
4:25 - Sponsor: Weights & Biases
6:15 - Problem Setup & Contributions
8:50 - Recap: Straight-Through Estimator
13:25 - Encoding the discrete problem as an inner product
19:45 - From algorithm to distribution
23:15 - Substituting the gradient
26:50 - Defining a target distribution
38:30 -
24 views
4
2
3 years ago 00:59:19 24
Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions (Paper Explained)