Learning to See: Hello, World! (2017)

For a much deeper conceptual and technical analysis, please see Chapter 4 of my PhD Thesis.


A Deep Neural Network (DNN) opening its eyes for the first time, and trying to understand what it sees as it trains in realtime on a live camera feed.

“Hello, World!” is a custom software that performs realtime DNN (CNN-VAE) training on a live video feed, while allowing a user to manipulate a number of hyperparameters in realtime (e.g. using faders on a midi controller). These hyperparameters include learning rate, momentum, gradient clipping thresholds, optimizer function, loss function, regularization weights and many more. Observing the results of these hyperparameter manipulations in realtime, help build a qualitative understanding of how they impact the training process.

Furthermore, it turns out realtime continuous manipulation of hyperparameters while training has great performative potential, turning the training DNN into a Deep Visual Instrument. With realtime continuous control and feedback, it’s relatively trivial (and fun) to find hyperparameter configurations which provide “stable oscillation points”, modulate oscillation frequency and/or amplitude, or create “controlled explosions”.

Finally as the network “learns” more about its environment, it starts projecting its expectations onto the incoming signal. Lots of fascinating emergent properties with regards to “reconstructing memories” and “reminiscing”.

Walkthrough

Moments of interest: 1:58-2:35, 2:47-5:10


This neural network has not been trained on anything. It starts off completely blank (i.e. randomly initialized). It is opening its eyes for the first time and trying to ‘understand’ what it sees. In this context, ‘understanding’ means trying to find patterns, trying to find regularities in what it’s seeing, so that it can efficiently compress and organise incoming information in context of its past experience and make accurate and efficient predictions of the future.

But the network is training in realtime, it’s constantly learning, and updating its ‘filters’ and ‘weights’, to try and improve its compressor, to find more optimal and compact internal representations, to build a more ‘universal world-view’ upon which it can hope to reconstruct future experiences. Unfortunately, the network also ‘forgets’. When too much new information comes in, and it doesn’t re-encounter past experiences, it slowly loses those filters and representations required to reconstruct those past experiences.

These ideas are not behaviours which I have explicitly programmed into the system. They are characteristic properties of deep neural networks which I’m exploiting and exploring.

* One might liken this to a new born baby’s brain. This comparison may appear to work metaphorically; however, it is not entirely accurate. A new born baby’s brain has had hundreds of millions of years of evolution shaping its neural wiring, and arguably the baby is born with already many synaptic connections in place. In this work however, this artificial neural network ‘starts life’ with full architecture in-tact, but all connections are initialised randomly.

For a much deeper conceptual and technical analysis, please see Chapter 4 of my PhD Thesis.

Background

“Learning To See” is an ongoing series of works that use state-of-the-art Machine Learning algorithms as a means of reflecting on ourselves and how we make sense of the world. The picture we see in our conscious mind is not a mirror image of the outside world, but is a reconstruction based on our expectations and prior beliefs. In “Learning To See”, an artificial neural network loosely inspired by our own visual cortex, looks through cameras and tries to make sense of what it sees. Of course it can only see what it already knows. Just like us.

The work is part of a broader line of inquiry about self affirming cognitive biases, our inability to see the world from others’ point of view, and the resulting social polarization.


Originally loosely inspired by the neural networks of our own brain, Deep Learning Artificial Intelligence algorithms have been around for decades, but they are recently seeing a huge rise in popularity. This is often attributed to recent increases in computing power and the availability of extensive training data. However, progress is undeniably fueled by multi-billion dollar investments from the purveyors of mass surveillance – technology companies whose business models rely on targeted, psychographic advertising, and government organizations focussed on the War on Terror. Their aim is the automation of Understanding Big Data, i.e. understanding text, images and sounds. But what does it mean to ‘understand’? What does it mean to ‘learn’ or to ‘see’?

Related work

Part of the Learning to See series:

Acknowledgements

Created during my PhD at Goldsmiths, funded by EPSRC UK.