Google Duo uses a new codec to improve call quality on poor connections


While US carriers are busy marketing their new 5G networks, the reality is that the vast majority of people will not experience the advertised speeds. There are still many parts of the US, and around the world, where data speeds are slow, so to compensate, services like Google Duo use compression techniques to efficiently deliver the best experience of video and audio possible. Google is now testing a new audio codec that aims to substantially improve audio quality over poor network connections.

In a blog post, Google’s AI team details their new very low-bit-rate, high-quality speech codec that they have named “Lyra.” Like traditional parametric codecs, Lyra’s basic architecture involves extracting distinctive speech attributes (also known as “features”) in the form of log mel spectrograms which are then compressed, transmitted over the network, and recreated on the other. extreme using a generative model. However, unlike more traditional parametric codecs, Lyra uses a new generative high-quality audio model that can not only extract critical parameters from speech, but can also reconstruct speech using minimal amounts of data. The new generative model used in Lyra builds on Google’s previous work on WaveNetEQ, the generative model-based packet loss concealment system currently used in Google Duo.

Lyra architecture

Lyra’s basic architecture. Source: Google.

Google says its approach has made Lyra on par with the next-generation waveform codecs used in many streaming and communication platforms today. Lyra’s benefit over these next-gen waveform codecs, according to Google, is that Lyra doesn’t send the signal on a sample-by-sample basis, requiring a higher bit rate (and therefore more data). To overcome concerns of the computational complexity of running a generative model on the device, Google says that Lyra uses a “cheaper recurring generative model” that works “at a lower speed” but generates multiple signals in different frequency ranges in parallel. which are then combined “into a single output signal at the desired sample rate.” Running this generative model on a mid-range device in real time produces a processing latency of 90 ms, which Google says is “in line with other traditional speech codecs.”

Along with the AV1 codec for video, Google says that video chats can even take place for users on an old 56 kbps dial-up modem. This is because Lyra is designed to work in environments with high bandwidth limitations, such as 3 kbps. According to Google, Lyra easily outperforms the royalty-free open source Opus codec, as well as other codecs such as Speex, MELP, and AMR at very low bit rates. Here are some voice samples provided by Google. With the exception of Lyra-encoded audio, each of the speech samples suffers from degraded audio quality at very low bit rates.

Clean speech

Original

[email protected]

[email protected]

[email protected]

Noisy environment

Original

[email protected]

[email protected]

[email protected]

Google says it trained Lyra “with thousands of hours of audio with speakers in more than 70 languages ​​using open source audio libraries and then verifying the quality of the audio with expert, collaborative listeners.” As such, the new codec is already being rolled out to Google Duo to improve call quality on very low-bandwidth connections. While Lyra is currently targeting voice use cases, Google is exploring how to make it a general purpose audio codec.

Source link