Latent spaces

Generative Adversarial Networks -or GAN- use the possibilities of Deep Neural Networks -the ability to self-configure themselves to perform specific tasks from a corpus of examples (the so-called training dataset)- in order to generate images (they can also be applied to the generation of text or sound). Its internal architecture is composed of two networks, one that produces images (the generative network) and one that values ​​them as belonging to the category of the training dataset or not (the discriminatory network). It is a game in which both networks self-configure themselves thanks to the other, one to produce images that the other judge as belonging to the training images and the other to better discern which ones would be and which ones not. In this process, the generating network produces images that are increasingly similar to the initial corpus.

When this self-configuration process is finished, what we have is a tool capable of producing images that are very similar to those we have provided it with, but which are not limited to them but, instead, open to almost infinite variations. Training images are, hypothetically, part of the images that can be generated by the trained GAN, but between each of these images there are an infinite number of similar and different variations.

If we imagine the hypothetical corpus of all the possible images that could generate the network, we could think of it in a two-dimensional space where the most similar images were closer to each other and where, therefore, we could move between two images transforming them slowly into each other with small differences. A trained GAN contains, in a way, a space like this, but not composed of two dimensions but many more (of the order of hundreds). The set of these dimensions conforms a multidimensional framework where each possible image can be understood as a point within this complex coordinate system. This framework is called “latent space”.