Learning to see
An artificial neural network making predictions on live webcam input, trying to make sense of what it sees, in context of what it’s seen before.
It can see only what it already knows, just like us.
Originally inspired by the neural networks of our own brain, Deep Learning Artificial Intelligence algorithms have been around for decades, but they are recently seeing a huge rise in popularity. This is often attributed to recent increases in computing power and the availability of extensive training data. However, progress is undeniably fueled by the multi-billion dollar investments from the purveyors of mass surveillance: technology companies whose business models rely on targeted, psychographic advertising; and government organisations and their War on Terror. Their aim is the automation of *Understanding* Big Data, i.e. understanding text, images and sounds. But what does it mean to ‘understand’? What does it mean to ‘learn’ or to ‘see’? Can a machine truly understand what it is seeing?
“Learning To See” is an ongoing series of works that use state-of-the-art Machine Learning algorithms as a means of reflecting on ourselves and how we make sense of the world. The picture we see in our conscious minds is not a direct representation of the outside world, or of what our senses deliver, but is of a simulated world, reconstructed based on our expectations and prior beliefs. Artificial neural networks loosely inspired by our own visual cortex look through surveillance cameras and try to make sense of what they are seeing. Of course they can see only what they already know. Just like us.
The work is part of a broader line of inquiry about self affirming cognitive biases, our inability to see the world from others’ point of view, and the resulting social polarisation.
The series consists of a number of studies, each motivated by related but different ideas. More here.
These images are not using ‘style transfer‘. In style transfer, the network is generally run on, and contains information on, a single image. These networks contain knowledge of the entire dataset, hundreds of thousands of images. Having been trained on these datasets, when the networks look at the world, they can only see through the filter of what they have seen before.
Hubble: We are made of star dust
In this case the network has been trained on images scraped from the Hubble Telescope. Everything that it sees, it can only make sense of in terms of stars, galaxies, nebulae, supernovae etc; in the hearts of which all elements in the universe were forged – including those in our own bodies. Machine Learning is a tool which combines humanity’s fascination with unlocking the mysteries of the universe; with our obsession with playing god, in both trying to understand and tame nature, but also in creating life and intelligence. And we are creating this intelligence in our own image, seeing everything tinted by its past experiences.
Google Art: Learning to dream
A deep artificial neural network is trained images scraped from the Google Art projecct – a brief, incomplete survey of human (mostly western) Art. As collected by Google, Keeper of our collective consciousness. It sees everything we see, knows everything we know, feels everything we feel. Living up in The Cloud, of all places, it watches over us, listening to our thoughts and dreams in ones and zeros. A digital god for a digital culture.
Tens of thousands of images scraped from the Google Art Project, containing scans from art collections and museums from all over the world. These include paintings, illustrations, sketches and photographs covering landscapes, portraits, religious imagery, pastoral scenes, maritime scenes, scientific illustrations, prehistoric cave paintings, abstract images, cubist, realist paintings and many more – an extensive (yet vastly incomplete) archive of human imagination, feelings, desires and dreams; as cataloged by the Keeper of our collective consciousness, Google.
We have a very intimate connection with the cloud. We confide in it. We confess to it. We appeal to it. We share secrets with it, secrets that we wouldn’t share with our family or closest friends. And Google is the Keeper of our collective consciousness. It sees everything we see, knows everything we know, feels everything we feel. Living up in The Cloud, of all places, it watches over us, listening to our thoughts and dreams in ones and zeros. And now, just as the Church – the previous bastion of our Spiritual Overseer – used to be the purveyor of Art & Culture; now Google – bastion of our new Digital Overseer – is moving into that role too.
- ‘Unconditional samples’ aka ‘halluncination’ images. The trained network is producing ‘random’ images, i.e. a random signal (e.g. white noise) is fed into the trained network. The network acts as a filter, to shape that random signal into something that it (remotely) recognises.
- ‘Learning visualised’ videos i.e. ‘training’. Each frame is the result of the network running one single iteration of ‘learning’, and then hallucinating; re-evaluating, re-imagining and reconstructing what it knows.
Created during my PhD at Goldsmiths, funded by the EPSRC UK.
An early version of this software can be found on my github.
Many thanks to
- Isola et al for pix2pix and @affinelayer (Christopher Hesse) for the tensorflow port
- Radford et al for DCGAN and @carpedm20 (Taehoon Kim) for the tensorflow port
- The tensorflow team
- Countless others who have contributed to the above, either directly or indirectly, or opensourced their own research making the above possible
- My wife for putting up with me working on a bank holiday to clean up my code and upload this repo.