Kinect: How works its multiarray microphone

Posted on: Nov, 26th 2010

The new Xbox 360 peripheral has a multiarray microphone that is capable of separate the voices that there are in front of the device from the others sounds of the enviroment to chat and use voice commands. Here you will learn how is done this difficult task.

A multiarray microphone is a lot of microphones that there are each one at the side of the other around a surface, recording all of them, the sound that arrive from all directions. In the case of Kinect we have, 4 microphones in a line, three of them in the left side and another in the right, all of them placed below of the devic

As is logic, if we put microphones in different places, the sound will arrive to them in different instants, in that way, we can calculate from where comes the source of sound if we take in to account the difference between the signals that get the microphones and the speed of the sound in the air. But not only we can calculate if the sound comes from one side or another, also can be determined aproximately its position. If someone wants to know how is calculated, can read this scientific paper, although you must have a high level of physics and mathematics to understand it. The multiarray microphones, simulate the behaviour of the ears. As all we know, we have one at each side of the head and this serves for that when we hear something, our brain calculate approximately, from the differences of phase of the wave that arrives to each ear, from where comes the sound.

When the position of the sound is calculated, a complex algorithm merges the signals of all of the microphones, getting a signal that contain the sound that comes from an imaginary cone that begins in the device and it expands to us.

Also is made a filter, deleting all that goes outside of the frequencies of the human voice (between the 80 and 1100 Hz) and upping the volume to these, in a way that, the ambient noise is filtered and the voice is amplified.

In case if this were not enough, the microphone deletes the echoes that produces the voice when bounces in the forniture and walls when it calibrates according to the reverberation of the enviroment, although if you changes the position of the forniture of the room you will have to calibrate it again. The calibration also can do the training of the voice recognizer, because must have some sample of our voice in order to lear to recognize it.

Kinect has several digitals signal processors (DSP) that processes the complex algorithms needed to realize these task. If the video console would do it, would consume a lot of resources and wouldn't leave a lot of resources and wouldn't leave time of processor to move the games.

Although with all of these, I don't think that the sound has the same quality as we put the microphone near of the mouth, but is a lot of more confortable.

Comments (0): Comment