[Paper review] Generative Adversarial Nets

1 minute read

Published:

Generative Adversarial Nets (https://arxiv.org/pdf/1406.2661.pdf)

0. Introduction

Propose new framework for generative models via adversarial process
i) Generative model G: Captures data distribution
ii) Discriminative model D: Estimates the probability that a sample came from the training data rather than G
iii) G train to maximize the probability of D making a mistake

Undirected graphical models with latent variables (Restricted Boltzmann Machines, DBMs)
Intractable except the most trivial instances
Deep belief networks (DBNs)
Hybrid models containing a single undirected layer and several directed layers.
Fast approximate layer-wise training criterion exists, but incur the computational difficulties associated with both undirected and directed models.
Alternative criteria do not approximate or bound the log-likelihood (Score matching, Noise-contrastive estimation)
Discriminative training criterion is employed to fit a generative model. However, rather than fitting a separate discriminative model, the generative model itself is used to discriminate generated data from samples a fixed noise distribution.
Generative stochastic network (GSN) framework, generalized denoising auto-encoders
Defines a parameterized Markov chain that one learns the parameters of a machine that performs one step of a generative Markov chain.
  • Adversarial nets framework does not need a Markov chain in sampling so no difficulties in sampling

2. How it works

D, G play the following two-player minimax game

Implements the game using an iterative, numerical approach while k steps of optimizing D and one step of optimizing G In practice, early in learning, when G is poor, D can reject samples with high confidence because they are clearly different from the training data.

for i in iteration:
    for k in steps:
        sample m noise samples from noise prior
        sample m examples from data generating distribution
        update D by ascending its stochastic gradient

    sample m noise samples from noise prior
    update G by descending its stochastic gradient

3. Advantages and disadvantages

Advantage: Markov chains are never needed, only backprop is used to obtain gradients, no inference is needed during learning, and a wide variety of functions can be incorporated into the model
Disadvantage: G must not be trained too much without updating D