Kinect’s Gesture Interface is Dictation All Over Again
After months of mind-bending Kinect hacks, Microsoft made good on their promise last month and released the Kinect for Windows SDK Beta, opening the floodgates for gesture-enabled everything. Which raises the question - do we want this?
Not long ago Microsoft added a Kinect interface for Netflix, Last.fm, and other multimedia apps to the Xbox 360 (after much popular demand). Users can browse recommended videos, skip tracks, and control playback by voice or by wacky arm waving.
Let’s get this out of the way: I’ve always been awed by the Kinect. The implications of hardware that can accurately provide skeletal tracking and intelligent gesture interpretation are astounding. It is, at the very least, groundbreaking. The first time I waved my hand, clutching a pizza slice, to start a Netflix movie, I had a real “I’m in the future” moment.
But then I remembered back on the early days of voice commands on the PC (and now in mobile), a technology that has been with us for years. In 1997, Dragon NaturallySpeaking struck many as a portent of the future in a way eerily similar to the Kinect fad now - it allowed us to interface with computers effortlessly, in the most natural way imaginable. With our voice.
Fast-forward 15 years and voice recognition - though exponentially more powerful and capable than ever - is a niche technology at best. Quite unlike Ray Kurzweil’s famous prediction that by 2009 most text would be transcribed via voice recognition, the average consumer is only exposed to this interface when making a Bluetooth call or frustratedly stumbling through a tech support tree on the phone. Authors, journalists, and students by and large wouldn’t touch commercial dictation technologies with a 10-foot pole.
The key to the comparison between voice recognition and gesture recognition is to understand that technology is not the issue. For the average speaker, modern voice recognition software provides stunning accuracy and speed. The issue is fidelity. Simply put, human speech is low-fidelity even when interpreted by other humans: nuances of emotion (and hence punctuation and emphasis) are frequently misunderstood. A computer cannot be asked to interpret punctuation as a user dictates because the user’s speech simply does not contain that data. So the dictating user is required to revisit his or her document and arduously edit it for style, punctuation, and mechanics. The technology is no less stunning becuase of this, but it is simply not convenient enough to be competitive with a manual keyboard.
I believe the same is true of the Kinect. While for certain very narrow applications (bowling games, browsing highly-graphical menus, dance/workout games) it is absolutely ideal and demonstrably superior to an analog controller, the same cannot be said of the broader range of typical computing applications. A shooter requires a level of control fidelity that the human body cannot perform discernibly in a vacuum. A racing game requires not just steering, but braking and acceleration and view manipulation. All this detail must be delivered to the game in near-zero latency, and must be performable by the user as an instinctive reaction.
None of this is to say that Kinect is a bad product or anything less than a stunning piece of technology. However, given the above I look at the Kinect’s Guinness record and wonder - could the Kinect 2 possibly achieve the same? How many Kinect users never bought another game after they played the bundled copies?
If there’s a future for the technologies behind Kinect, I expect we’ll see it crop up in gesture-enabled televisions that - like voice dial in cell phones - enables users to interact with the system (volume, changing channels, pausing, etc) with high accuracy but a relatively low set of commands. A TV with gesture recognition built-in would let me change the channels with greasy hands or raise the volume without looking for the remote. But you can take this one to the bank: Kinect’s future is absolutely in relatively narrow applications to specific markets and niche needs, not as a universal gaming peripheral or even consumer device.

