YOLO9000 is an object detection neural net trained using a dataset of 9,418 words and millions of images. The experiments that follow focus on exploring its functioning: understanding what it sees and how it speaks.
In artificial vision, the choice of words to describe an image is the least automatic task, as humans are commissioned to perform it. The machine uses these glossaries to act as our best pupil: it learns what we make it see. Getting artificial vision to work implies educating it in a particular system of seeing. The experiments that follow are based on replacing the YOLO9000 vocabulary with other lists of words.
Experiments in which neural networks of artificial vision have been trained. Training categories are provided by the world of art (artistic styles, collections of museums or artists) or concepts related to the production of images (tools or composition concepts).
Experiments made with Pix2Pix; a GAN network (generative adversarial network), in other words, it is designed to generate images. This kind of network has been designed principally to transform the style of an image. It functions on the basis of training with pairs of images: the network uses its training to learn to automatically go from one type of image in the pair to the other. In our experiments, we have attempted to produce a machine imagination – the neural net after training – and to stimulate it to produce unexpected results.
Initial experiments with artificial vision tools focused on the description of images and facial analysis.