Learning To See [WIP] (2017)

Early Work In Progress

WIP stills on flickr



Early work in progress

A deep convolutional neural network learning to see. Multi-screen video installation.


Recently we are seeing huge breakthroughs in artificial intelligence and particularly families of algorithms referred to as ‘Deep Learning’. Originally inspired by the neural networks of our own brain, particularly the visual cortex, now highly modified and adapted to run on the parallelised silicon architectures of our computers, Deep Convolutional Neural Networks out-perform humans in many specific tasks. Although these algorithms have mostly been around for decades, they are recently seeing a huge boost in popularity and success, often attributed to recent increases in computing power and the availability of large training datasets. However, the undeniable driving force behind these recent developments is the multi-billion dollar investments by the purveyors of mass surveillance, part backed by government organisations and the War on Terror, part backed by internet companies whose business models rely on targeted, psychographic advertising. The common motivation is the automation of *Understanding* Big Data: understanding text and documents, understanding sounds and understanding images. But what does it even mean to ‘understand’? What does it mean to ‘learn’ or to ‘see’? These questions have been asked in fields as diverse as philosophy, psychology, cognitive science, neuroscience, information theory, artificial intelligence; and now an answer is being imposed upon us via neoliberalism. This work explores these themes, and exposes the process of seeing, and particularly *learning* to see, through this lens.

High level description

This is a deep neural network that has not been trained on anything. It starts off completely blank*. It is literally ‘opening its eyes’ for the first time and trying to ‘understand’ what it sees. In this case ‘understanding’ means trying to find patterns, trying to find regularities in what it’s seeing now, and with respect to everything that it has seen so far; so that it can efficiently compress and organise incoming information in context of its past experience. It’s trying to deconstruct the incoming signal, and reconstruct it using features that it has learnt based on what it has already seen – which at the beginning, is nothing. When the network receives new information that is unfamiliar, or perhaps just from a new angle that it has not yet encountered, it’s unable to make sense of that new information. It’s unable to find an internal representation relating it to past experience; its compressor fails to successfully deconstruct and reconstruct. But the network is training in realtime, it’s constantly learning, and updating its ‘filters’ and ‘weights’, to try and *improve its compressor*, to find more efficient internal representations, to build a more ‘universal world-view’ upon which it can hope to reconstruct future experiences. Unfortunately though, the network also ‘forgets’. When too much new information comes in, and it doesn’t re-encounter past experiences, it slowly loses those filters and representations required to reconstruct those past experiences.

These ideas are not behaviours which I have explicitly programmed into the system, they are characteristic properties of deep neural networks which I’m exploiting / exploring.

* One might liken this to a new born baby’s brain. However, this comparison is not entirely accurate. A new born baby’s brain has had hundreds of millions of years of evolution shaping its neural wiring, and arguably the baby is born with already many synaptic connections in place. Here, this network ‘starts life’ with full architecture in-tact, but all connections are initialised randomly. So the comparison may work metaphorically at a high-level, but at a lower level the details are a bit different.

Low level description

Details coming soon – though see links below, and very briefly: The network is a deep convolutional autoencoder (I’m currently playing with different architectures, variational, gan etc). Most traditional applications of convolutional or deep neural networks usually involve an ‘offline training phase’ – where the network is trained on a ton of data first – followed by an ‘inference’ (or ‘prediction’) phase – where the trained model is deployed, and makes predictions (or inferences). Here, the network is training on new data as it comes in* from the camera in realtime, while I’m playing with various parameters (such as learning rate, momentum, gradient clipping, optimisation algorithm etc).

* This is referred to as ‘online’ learning (nothing to do with the internet).

References and recommended reading

Coming soon but some to start:
Convolutional Neural Networks:
Deepdream is blowing my mind
Background info for ‘Deepdream is blowing my mind’
ml4a ConvNets
cs231n Stanford ConvNet notes

journey through multiple dimensions and transformations in space and time (Section 7)
Stanford AE Tutorial
Keras AE Tutorial

JS formal theory of creativity & compression