Background Noise Suppression in AI

In the age of working from home due to COVID-19, many people have been using video calling programs like Zoom to simulate a lifelike, collaborative work environment. However, as useful as video calling has been, it doesn’t come without challenges. Specifically, the inability to successfully filter out distracting background noise has plagued these programs since their invention. Zoom’s customizable audio settings have allowed it to progress in this area in comparison to competitors like FaceTime and Skype, but not even Zoom can cancel out a noisy vacuum cleaner or a screaming child. While the technology to fully cancel out background noise on video calls is not available right now, companies are making major breakthroughs in improving this feature in hopes that it cannot only improve their product, but also be expanded for use in other industries. 

Major technology companies like Microsoft are leading the charge on this innovation by attempting to use machine learning with video calling programs. Specifically, they hope to teach machines to distinguish between speech and background noise when they overlap. The way they approach this is by simulating a real call. The model is first familiarized with the signal or “voice” it is trying to focus on. Researchers then attempt to confuse it by playing various types of noise at the same time as the signal. Ultimately, the repetition of these exercises with different types of sounds and signals will allow the data set for the program to grow significantly. This process is designed to help the machine build “neural networks” so it can be ready for a real background noise canceling situation on a Zoom call or in another capacity. 

The ability to properly cancel background noise will not only improve the quality of video calls, but also expand the uses for them by making the call more lifelike overall. For example, musicians could turn the concept of a “Zoom lesson” from an obligation to a useful tool in their development. To be clear, musicians already use video calling services for lessons, but most try to avoid it at all costs due to concerns over audio quality and the ability for their instruments to be heard. Video calling services often drop audio out during video lessons because their inferior ability to cancel background noise causes the program to cut all sound, including the music being played. Even though successful background noise cancellation would not solve other issues musicians face with the audio of online lessons, the consistency it provides would make them more of a rival than a backup to in-person learning.  

While the immediate use for improved noise cancelling technology likely is through video calls, as industries decide to automate, more opportunities will become available. For example, as companies develop and implement conversational AI, it will make its way into parts of everyday life like fast food ordering. If a customer were to order a Big Mac in a crowded area with lots of noise and horns honking in the background, it is imperative that the AI understands them as well as a human would. Similar situations could arise when talking to Siri after a few more updates or Alexa if Amazon decides to add it to their Amazon Go grocery stores. Although it may not seem like the most revolutionary technology, solving the mystery of high-quality background noise canceling will allow flashier AI projects like conversational AI to progress and thrive.