1. How are we taking the supremum out of the when calculating the f-divergence?
  2. Why do we require a critic network when the aim is to build a generator that can allow random sampling from ?
  3. What does it mean by transforming one marginal to another in Wasserstein’s metric?
  4. How does the Wasserstein’s metric tackle the vanishing gradients problem?
  5. What is in the Kantrovic-Rubenstein’s Duality?