Hearing aid technology is poised to take a leap forward. (Unsplash/Mark Paton)
It's called the "cocktail party problem": In a crowded environment with many overlapping conversations, people with hearing aids often find it hard to pick out a single speaker. Scientists at MIT Lincoln Laboratory have proposed a solution, creating a new system to separate two simultaneous speakers while suppressing other noise that may aid tomorrow's cognitively controlled hearing aids.
"It's definitely a common complaint," Christopher J. Smalt, a human health & performance systems researcher at MIT Lincoln Laboratory and a coauthor of the study, published March 4 in Neural Networks, told The Academic Times. "The background noise and the thing that they're trying to focus on are so similar — it's like the most difficult listening scenario because of that."
The team designed an end-to-end neural network architecture that resembles the systems people used before neural networks, which first entered the field roughly half a decade ago. According to the first author of the study, Bengt J. Borgström, this meant that the researchers did not have to approach the network as a black box — they could think about it more intuitively.
"In the systems people used before neural networks, they would just extract a spectrogram, for example, and come up with some mask to let the target speaker pass through and attenuate everything else," Borgström said. "If we set up the network to mimic that pipeline, then we can play with that mask."
Borgström described a fundamental tradeoff between speech quality and interference. Too much masking gets rid of background noise but also makes it harder to hear the speaker. "You may hear this when you're talking on a cellphone," he said. "They have speech enhancement, but it's being too aggressive. You start hearing these very annoying artifacts — these little blips." On the other hand, without any masking, background noise overwhelms the speaker.
That is why the team incorporated something called a mask continuation factor. As it turns out, people like a little noise in their signal — and the mask continuation factor lets them set the right balance.
The researchers trained their system using a simulated 111-hour corpus, developed from an in-house dataset of 38 male and female speakers. "What we want is something that could conceivably be recorded in a hearing aid," Smalt said. "You could try to rerecord thousands or millions of hours of speech through the actual system that you want to record it from, or you can simulate it, which is what we did in this paper. It's a pretty common thing that people do." They digitally inserted background noise, using recordings of a busy cafeteria.
Eight users tested their system by judging the intelligibility of the speech it processed. "We did the tests completely remotely, because of COVID," Smalt said. "We tested it over headphones with a program. People would listen to speech and background noise. There were two people talking at the same time. We told them, 'Listen to the one on the left.'"
Smalt and his colleagues have not yet implemented their innovation in a hearing aid. "That's the next direction," he said. "We start with a computer, and then we can refine it to an actual prototype that's wearable. Right now, we're just simulating that as best we can."
The team hopes their separation algorithm contributes to the development of cognitively controlled hearing aids. First dreamed up several years ago, the devices would use signals from the brain to make the hearing aid focus on sounds the user wants to hear.
"If there are multiple sounds in the environment, the one that you're attending to is most reflected in recordings of your neural activity," said Smalt. "So, it is basically possible to figure out what you want to listen to from EEG." A large European consortium is also working on the problem.
But Smalt thinks thought-controlled hearing aids are still a few years out. "We're trying to take steps toward a realizable system," he said, noting that his team has done some real-time testing of neural signals from users.
Established during the Cold War to build the United States' first air defense system, the MIT Lincoln Laboratory now conducts research with wide-ranging applications, including in human health. "We've been looking at human language technologies for many decades," said Smalt. "The health of the population is something they're considering more and more at Lincoln Lab. That includes veterans, many of whom have hearing loss and tinnitus, or ringing in the years. There's also a lot of new evidence that hearing loss can cause cognitive impairments, because it reduces your ability to pick up on what's going on in your environment."
In their paper, the team envisioned a more elaborate version of their design that could accommodate three or more speakers. "It would be a lot of fun to try to expand it to be more robust," said Borgström. "Especially to have multiple speakers — some arbitrary number of speakers to infinity — and for it to be able to separate each one of them. But it would probably require a more complex network architecture." He also thinks the system would be improved by adding more kinds of background noise.
"As it stands, this is a thing that we can run on a computer, and it's offline," said Smalt. "We think that we implemented it in a way that it doesn't have to be the case. The end goal is something that is real-time and maybe on a smartphone."
The paper, "Speaker separation in realistic noise environments with applications to a cognitively-controlled hearing aid," published in Neural Networks on March 4, was authored by Bengt J. Borgström, Michael S. Brandstein, Gregory A. Ciccarelli, Thomas F. Quatieri, and Christopher J. Smalt, MIT Lincoln Laboratory.