I’m delighted to learn that Leonardo, the prestigious peer-reviewed journal focusing on the intersections of art, science and technology, has selected my thesis abstract to be amongst their highest-rated, and to be published in their October 2022 issue (Vol 55, No 5). https://leonardo.info/labs-2021
Thesis can be downloaded from https://research.gold.ac.uk/id/eprint/30191/
Completed in the Department of Computing, Goldsmiths University of London; under the supervision of Dr. Mick Grierson and Dr. Rebecca Fiebrink; funded by the EPSRC.
Title:
Deep Visual Instruments: Realtime Continuous, Meaningful Human Control over Deep Neural Networks for creative expression
Abstract
In this thesis, we investigate Deep Learning models as an artistic medium for new modes of performative, creative expression. We call these Deep Visual Instruments: realtime interactive generative systems that exploit and leverage the capabilities of state-of-the-art Deep Neural Networks (DNN), while allowing Meaningful Human Control, in a Realtime Continuous manner.
We characterise Meaningful Human Control in terms of intent, predictability, and accountability; and Realtime Continuous Control with regards to its capacity for performative interaction with immediate feedback, enhancing goal-less exploration. The capabilities of DNNs that we are looking to exploit and leverage in this manner, are their ability to learn hierarchical representations modelling highly complex, real-world data such as images. Thinking of DNNs as tools that extract useful information from massive amounts of Big Data, we investigate ways in which we can navigate and explore what useful information a DNN has learnt, and how we can meaningfully use such a model in the production of artistic and creative works, in a performative, expressive manner.
We present five studies that approach this from different but complementary angles. These include: a collaborative, generative sketching application using MCTS and discriminative CNNs;
a system to gesturally conduct the realtime generation of text in different styles using an ensemble of LSTM RNNs; a performative tool that allows for the manipulation of hyperparameters in realtime while a Convolutional VAE trains on a live camera feed; a live video feed processing software that allows for digital puppetry and augmented drawing; and a method that allows for long-form story telling within a generative model’s latent space with meaningful control over the narrative.
We frame our research with the realtime, performative expression provided by musical instruments as a metaphor, in which we think of these systems as not used by a user, but played by a performer.
Description
This research investigates how the latest developments in Machine Learning – with an emphasis on Deep Learning – can be used to create intelligent systems that enhance artistic expression. These are systems that people can interact with and gesturally ‘conduct’ to expressively produce and manipulate text, images and sounds – in effect, collaborating with a ‘creative’, ‘talented’ agent. These systems learn – both offline and online – and have a level of autonomy that could be perceived as creative behaviour.
The desired relationship between human and machine (software) here is analogous to that between an Art Director and graphic designer, film director and video editor, concept / story teller and ghost writer – i.e. a visionary communicates their vision to a ‘doer’ who produces the actual output under the direction of the visionary (though often the doer also shapes the output with their own vision and skills). Crucially, the desired human-machine relationship here also draws inspirations from that between a pianist and piano, a conductor and orchestra, an abstract expressionist painter and the system comprising of brush+paint+canvas+gravity+fluid dynamics. I.e. again a visionary (human) communicates their vision to a system which produces the actual output, but this communication is real-time, continuous and expressive; it’s an immediate response to everything that has been produced so far, creating a closed feedback loop.
Within this very broad topic, the key problem area that the research tackles is as follows: Given a very large corpus of example data (e.g. thousands or millions of examples), we can train a generative deep model. That model will hopefully learn something, and contain some kind of ‘knowledge’ about the data (and its underlying structure). The questions are: i) What exactly has the model learned? how can we investigate what knowledge the model contains? ii) how can we do this interactively and in real-time, and expressively explore the knowledge that the model contains iii) how can we use this to steer the model to produce not just anything that resembles the training data, but what *we* want it to produce, *when* we want it to produce it, again in real-time and through expressive, continuous interaction and control.