Kinect – why it matters (2010)

There’s been a lot of buzz on the internet lately – at least in the circles I frequent – about the recently released Microsoft Kinect for Xbox. For those who know nothing about it, it’s a peripheral for Microsoft’s Xbox game console, that allows you to play games without a game controller, instead you just move your arms, body and legs, and it tracks and interprets your movements and gestures. The impact this will have on gaming is debatable. The impact this will have on my life and many others involved with new media art, experimental visual and sound performance, is a bit more significant. More on that below.

The tracking is made possible by some very clever hardware. It has a normal color camera, similar to a webcam; an array of microphones; accelerometer; motor etc.; but most interestingly – at least for me – it has a laser IR projector and an IR camera, which it uses to calculate a depth map, and for roughly every pixel in the color image, you can retrieve its distance to the camera. Why does that matter? More on that below.

While the kinect was designed to be used only with the Xbox, within a few hours of it being released its signal was simultaneously decoded by unrelated people around the world and open-source linux drivers were released to the public. Others then ported the linux drivers to Mac and Windows, so everyone could start playing with the hardware on their PCs. A nice brief summary of this period and those involved can be found at To keep it brief I won’t go into details, I’d like to focus on why this matters.

What the kinect does, is nothing new. There have been depth sensing cameras on the market for quite a while, and some probably with better quality and build. What sets the kinect apart? Its price. At £130 it isn’t something everyone can go out and buy a handful of, but it is a consumer device. It is a device that most people who want it can either buy it, or will know someone who can get hold of one or they can borrow. It is a potential common household item. Whereas anything else on the market that comes close to its capabilities costs significantly more (starting at £2000, jumping up to £4000-£5000+), and not to mention being aimed at industrial businesses, robotics, military etc. they are considerably more complicated to acquire and use.

But why does this matter?

For me it’s very simple. I like to make things that know what you are doing, or understand what you are wanting to do, and act accordingly. There are many different ways of creating these things. You could strap accelerometers to your arms and wave them around, and have the accelerometer values drive sound or visuals. You could place various sensors in the environment, range finders, motion sensors, microphones, piezos, cameras etc. Ultimately you use whatever tools and technology you have / create / hijack, to create an environment that ‘knows’ what is happening inside it, and responds the way you designed and developed it to.

What interests and excites me is not the technology, but how you interpret that environment data, and make decisions as a result of your analysis. How intuitive is the interface? Does it behave as you’d expect? You could randomly wire the environmental parameters (e.g. orientation of arm), to random parameters (e.g audio frequency or speed of video), and it will be fun for a while, but it won’t have longevity if you can’t ultimately learn to play and naturally express yourself with it. It won’t be an *instrument*. In order to create an instrument, you need to design a language of interaction – which is the fun side of interaction design. That is a huge topic in itself which I won’t go into now. The next step, is the technical challenge of making sure you can create a system which can understand your newly designed interaction language. It’s too common to design an interaction, but not have the technical capabilities to implement it – in which case you end up with a system which reports incorrectly, and makes inaccurate assumptions resulting in confusing, non-intuitive interaction and behaviour. The solution? Smarter analysis of course. See if there are better ways of analyzing your data to give you the results you need. A complimentary solution, is to ask for more data. The more data you have about the environment, the better you can understand it, and the smarter, more informed decisions you can make. You don’t *need* to use all the data all the time, but it helps if it’s there when you need it.

Kinect, being a depth sensing camera, gives us a ton of extra data over any consumer device in it’s price range. With that extra data, we are a lot more knowledgable about what is happening in our environment, we can understand it more accurately, thus we can create smarter systems that respond more intuitively.

A lot of people are asking “what can you do with kinect that you couldn’t do before”. Asking that question, is missing the point. It depends what exactly “you” means. Is the question “What can I, Memo, do with kinect that I couldn’t do before?” Or is it “what could Myron Krueger do with kinect that he couldn’t before?” (answer is probably not much), or is it referring to a more generic “you”?

Kinect is making nothing which wasn’t already technically possible, possible. It is just making it accessible, not just in terms of price, but also in terms of simplicity and ease. The question should not be “what can you do with kinect that you couldn’t do before”, but it should be “how much simpler is it (technically) to do something with kinect, which was a lot harder with consumer devices before kinect”. To demonstrate what I mean, here is a rough prototype I posted yesterday within a few hours of getting my hands on a kinect.

Kinect is hooked up to my macbook pro, I’m using the opensource drivers mentioned above to read the color image and depth map, and wrote the demo prototype you see above. One hand draws in 3D, two hands rotates the view.

Without kinect this is completely possible. You could use high end expensive equipment, but you don’t even need to. You could use two cheap webcams, make sure you have good control of your lighting, you might need to setup a few IR emitters, ideally try and get a clean unchanging background (not essential but helps a lot). And then you will need a *lot* of hairy maths, algorithms and code. I’m sure lots of people out there are thinking “hey what’s the big deal, I don’t find those algorithms hairy at all, I could do that without a Kinect, and I already have done”. Well smartass this isn’t about you.

With the kinect, you pretty much just plug it in, make sure there isn’t any bright sunlight around, and with a few lines of code you have the information you need. You have that extra data that you can now use to do whatever you want. Now that interaction is available for artists / developers of *all* levels, not just the smelly geeks – and that is very important. Once we have everybody designing, creating and playing with these kinds of interactions – who prekinect would not have been able to – then we will be smothered in amazing, innovative, fresh ideas and applications. Sure we’ll get thousands of pinch-to-zoom-and-rotate-the-photo demos, which will get sickening pretty quickly, but amongst all that will be ideas that you or I would have never thought of in a million years, but we’ll instantly fall in love with, and it will spark new ideas in us, sending us off in a frenzy of creative development, which in turn feeds others and the cycle continues.

And that’s why it matters.

Of course there are tons of amazing computer vision based projects that were created before Kinect, some created even before computers as we know them existed. It still blows my mind how they were developed. But this isn’t about those super smart people, who had access to super expensive equipment and the super skills and resources to pull off those super projects. This is about giving the tools to everyone, leveling the playing field, and allowing everyone to create and inspire one another.

It’s still very early days yet. It’s mainly been a case of getting the data off the kinect into the computer, seeing what actually is that data, how reliable is it, how is it’s performance and what can we do with it. Once this gets out to the masses, that’s when the joy will start pouring in 🙂

Thank you Microsoft for making this, and all the hackers out there who got it working with our PCs within a few hours.


There’s been a lot of buzz on the internet lately – at least in the circles I frequent – about […]

Related keywords

c++, computer vision, kinect, open source, openframeworks