Rastko Ciric, Riley De Haan, Rafael Rafailov
Generative Adversarial Networks have been a major breakthrough in deep generative models. There have been many extensions and refinements of the original idea, including InfoGAN and BiGAN, which we focus on in this project. An InfoGAN model adds certain structure variables to the noise input and applies a mutual information based regularization penalty between the structure variables and the generated output. This achieves the objective of disentangling model, in such a way that the structural variables correspond to semantic properties of the generated images (class, rotation, width etc..). A BiGAN on the other hand trains a Generator and an Encoder in parallel via the same adversarial game a regular GAN uses. In this project we combine both architectures. We work with the same structure as a BiGAN, however we add the mutual information regularizer along the Generator branch, which also forces the Encoder to learn disentangled representations of the input data, that correspond to semantic features. in a completely unsupervised way. A more detailed report is available here: https://www.overleaf.com/read/szqphxfkrjbj
To start, we would say that the outreach was an unexpected surprise and one of the highlights of our quarter. The event may have sounded out of place at first in a graduate engineering class, but the act of distilling a discipline as abstract as information theory, and in the case of our project, of InfoBiGANs, was fun and something that we’ve all likely also, like Tsachy described with his daughter, at times struggled to do in explaining to our own families about the various topics of AI, human-computer interaction, info theory, etc. that we’re studying. We felt our outreach was successful and engaged the kids who came by. We started with a brief overview of neural networks as “a lot of cells that talk to each other, just like you and I are talking to each other now” and that “together can do cool stuff”. At this point we introduced concepts of recognizing “cats” with lots of neurons before coming to probably the most fun (but also most worrisome) part of the talk in which we played a game called NameCat.
In the game, we ask the students to enter their name into a prompt, which was then hashed into latent codes as input to a pre-trained StyleGAN implementation to generate a personal cat for that student’s name (Kedar’s idea). We of course had no control over how their name would be translated into “cat”, but were pretty happy with how reasonably well-behaved the StyleGAN outputs were. Some of the later students got some demonic, dark furballs with beady eyes, and one such girl was sad and wondered why her name had resulted in such a disturbing feline. We tried to let her play the game again, this time by capitalizing her name, but I don’t think the result was much better. In the end though, most students were happy with their cats, and lit up when the StyleGAN finally came to the magical conclusion of “all the neurons talking with each other to figure what YOUR cat looks like!”.
At this point we tried to describe GANs in terms of the adversarial “game” played between the generator network, which tries to create fake pictures, and the discriminator network, which tries to catch which images are fake and which are real, helping both learn in the process. We then asked the students to play what we called the “discriminator game” and see if they could discriminate between photos of real people and photos generated by the StyleGAN from https://thispersondoesnotexist.com/ (which, while definitely distinguishable, are somewhat disturbingly realistic). The kids were quite surprised to find out the true fakes in most cases (only one student correctly recognized all three fakes).
We finally talked about our own project, the InfoBiGAN, showing the training process for the GAN in terms of the images developed by the network over the course of training, starting out at images of noise and quickly converging to recognizable digits. We also showed a video of a trained InfoGAN generating fake faces with features, such as facial orientation, shape, hair style and color, etc. being continuously varied.
In the end, a decent introduction to neural networks and generative models was given, which the kids seemed to not only understand and be engaged by, but also to enjoy (we certainly did).
Thank you Tsachy and the rest of the course staff for such a great quarter!