"You sound drunk af": on the challenges in Generative Speech Restoration

14:05 - 14:35, 23rd of May (Thursday) 2024 / DATAMASS STAGE

This talk delves into Generative Speech Restoration (GSR), a nascent field that presents unique challenges and opportunities. Leveraging the power of generative AI, this technology transforms low-quality speech recordings into outputs akin to professional podcasts. The presentation begins by briefly tracing the historical use of ML in audio enhancement, setting the stage for an introduction to the principles of GSR and the current state of the art. Various examples will illustrate GSR applications, and strategies for enhancing model performance will be discussed. The session will also address key aspects to monitor to mitigate common pitfalls, particularly the need to maintain both high speech quality and intelligibility. Related topics, such as evaluation metrics including WVMOS, P.808, and NISQA, will be covered. Additionally, the talk will highlight various use cases of GSR in current business environments, demonstrating its practical applications and impact. Attendees will gain a comprehensive overview of GSR's current standing and its trajectory in the realm of audio processing technology.

TRACK:

AI/ML

Karol Duzinkiewicz

DAC.digital