[PR12] Generative Models as Distributions of Functions
1. Generative Models as Distributions
of Functions
PR12와 함께 이해하는
Jaejun Yoo
(current) Postdoc. @EPFL
(from July) Assistant Prof., @UNIST
PR-312, 11th April, 2021
2. Today’s contents
“For all datasets, we use an MLP with 3
hidden layers of size 128 … and an MLP
with 2 hidden layers of size 256 and 512”
“We performed all training on a single
2080Ti GPU with 11GB of RAM.”
3. Motivation and Main Problem
“Conventional signal representations are usually discrete.”
However, Mother Nature is continuous!
(well… up to planck constant…?)
2D Images Audio 3D Shapes
4. Motivation and Main Problem
Of course, these functions are usually not analytically tractable. it is impossible to "write down"
the function that parameterizes a natural image as a mathematical formula.
Continuous representation?
Why hard?
5. Motivation and Main Problem
Why important?
• independent of spatial resolution (infinite resolution)
• Geometric transformation of images: zoom, rotation, super-resolution.
• Derivatives are well-defined.
6. Motivation and Main Problem
Why important?
• independent of spatial resolution (infinite resolution)
• Geometric transformation of images: zoom, rotation, super-resolution.
• Derivatives are well-defined.
7. Motivation and Main Problem
Why important?
Piecewise Constant Bilinear Cubic Spline
8. Motivation and Main Problem
Why important?
Piecewise constant Bilinear Cubic Spline
12. Continuous representation?
• DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation (Park et al. 2019)
• Occupancy Networks: Learning 3D Reconstruction in Function Space (Mescheder et al. 2019)
• IM-Net: Learning Implicit Fields for Generative Shape Modeling (Chen et al. 2018)
• … NeRF (PR-302)…
“Implicit Neural Representations approximate this function via a neural network!”
Motivation and Main Problem
Implicit Neural Representation!
13. Implicit Neural Representation
- Remarkably, the representation !" is independent of
the number of pixels. The representation !" therefore,
unlike most image representations, does not depend
on the resolution of the image.
- The core property of these representations is that
they scale with signal complexity and not with
signal resolution.
14. Learning Distributions of Functions
1. Parameterizing a distribution over neural
networks with a hypernetwork (Ha et al., 2017)
Overall Scheme
“Sample the weights of a neural network”
to obtain a function.
: Learning a distribution over functions !" is equivalent to
learning a distribution over weights #(%).
: Then, #(%), where % = () * , is refer to as a neural
function distribution (NFD).
15. Learning Distributions of Functions
1. Parameterizing a distribution over neural
networks with a hypernetwork (Ha et al., 2017)
Overall Scheme
“Sample the weights of a neural network”
to obtain a function.
: Learning a distribution over functions !" is equivalent to
learning a distribution over weights #(%).
: Then, #(%), where % = () * , is refer to as a neural
function distribution (NFD).
However! How do we get access to the
ground truth functions to train the network?
16. Learning Distributions of Functions
1. Parameterizing a distribution over neural
networks with a hypernetwork (Ha et al., 2017)
Overall Scheme
“We do have access to input/output
pairs of these functions through the
coordinates and features, allowing us to
learn function distributions without
operating directly on the functions!”
17. Learning Distributions of Functions
1. Parameterizing a distribution over neural
networks with a hypernetwork (Ha et al., 2017)
Overall Scheme
2. Training this distribution with an adversarial
approach (Goodfellow et al., 2014).
“We do have access to input/output
pairs of these functions through the
coordinates and features, allowing us to
learn function distributions without
operating directly on the functions!”
18. Learning Distributions of Functions
1. Parameterizing a distribution over neural
networks with a hypernetwork (Ha et al., 2017)
Overall Scheme
2. Training this distribution with an adversarial
approach (Goodfellow et al., 2014).
* is a kind of position encoding (Fourier feature).
“We do have access to input/output
pairs of these functions through the
coordinates and features, allowing us to
learn function distributions without
operating directly on the functions!”
19. Learning Distributions of Functions
Overall Scheme
NFD
Now we know how to design a network to learn continuous functions!
20. Learning Distributions of Functions
Overall Scheme
Discriminator
But, the data we consider may not necessarily lie on a grid…
21. Learning Distributions of Functions
Overall Scheme
Discriminator
… in which case it is not possible to use convolutional discriminators.
22. Learning Distributions of Functions
Overall Scheme
Discriminator
Our discriminator should be able to distinguish between
real and fake sets of coordinate and feature pairs.
23. Point Cloud Discriminator
Point Convolution
In contrast to regular convolutions,
where the convolution kernels are only
defined at certain grid locations, the
convolution filters in PointConv are
parameterized by an MLP mapping
coordinates to kernel values:
24. Experiments
“For all datasets, we use an MLP with 3
hidden layers of size 128 … and an MLP
with 2 hidden layers of size 256 and 512”
“We performed all training on a single
2080Ti GPU with 11GB of RAM.”
“Remarkably, such a simple architecture
is sufficient for learning rich distributions
of images and 3D shapes.“
“Use the exact same model for both
images and 3D shapes except for the
input and output dimensions of the
function representation.”
Implementation Setups
25. Results
2D Image generation
• Samples from our model trained on CelebAHQ.
• 64×64 (top) and 128×128 (bottom)
• Each image corresponds to a function which
was sampled from our model and then
evaluated on the grid.
• To produce this figure we sampled 5 batches
and chose the best batch by visual inspection.
26. Results
“To the infinity and beyond!”
- Buzz Lightyear, Toy Story
Super-resolution
NFD
64×64
NFD
256×256
Bicubic
256×256
NFD
28×28
NFD
256×256
Bicubic
256×256
27. Results
3D shapes
Voxel grids from Choy et al. (2016) representing the chairs category from the ShapeNet (Chang et al.,
2015) dataset. The dataset contains 6778 chairs each of dimension 32#
. For each 3D model, uniformly
subsample K = 4096 points among 32# = 32,768 points and use them for training.
28. • A step towards making implicit neural representation methods genuinely useful
for modeling datasets rather than individual data points.
• The first framework to model data of this complexity in an entirely continuous
fashion.
• The ability of being independent to resolution and operating outside of a grid.
• A unique way of using point cloud discriminators.
Conclusion
Summary of Contributions (I think)
29. Things to discuss about…
• What kinds of study would be derived from this?
• Architectural developments (better quality)?
• Then How? Or what would be helpful?
• Other applications?
• Again, compute-driven AI vs human-knowledge based?
• Big model vs inductive bias?
• Etc.?