NEW YORK, USA — The global sound agency Listen has created an interactive audio experience powered by Microsoft AI technology that takes fans behind the scenes of “Adrenaline,” a single off X Ambassadors’ new album, The Beautiful Liar.
The digital experience is narrated by a custom-designed, synthetic voice modeled after X Ambassadors’ Sam Harris. The band’s lead singer plays the role of “The Shadow,” a sinister character based on the voice inside your head, a theme explored throughout the high-concept album. The AI version of Sam Harris/The Shadow guides visitors to the site through an interactive audio story that uses voice-activated prompts to uncover insights behind the making of the song.
To create the experience, Listen used Microsoft’s Custom Neural Voice technology to capture and digitally recreate Sam Harris’ voice. Over multiple recording sessions, the Listen team recorded Harris reading over 1500 sentences, about 8 hours of dialogue. Within Microsoft’s Azure cloud, Listen trained the AI on Harris’ recordings, creating a natural sounding representation of his character’s voice. Leveraging Microsoft’s Text-to-Speech technology, the synthetic voice can respond to visitors’ responses in real-time.
“It was a grueling but interesting process to experiment with AI and the potential it has for storytelling,” said Sam Harris of X Ambassadors. “I had to read hundreds of lines of text for the AI to map my voice, read in the tone of voice of The Shadow. Hearing it back, I was so surprised at how accurately it captured the cadence!”
As a companion to X Ambassadors’ album, the idea was to make a behind-the-scenes audio documentary about the song “Adrenaline” using a synthetic voice that can interact with fans. “Oftentimes, you’ll see this type of content with artists explaining how a song came together,” said Jordan Rothlein, Listen’s senior podcast director, “but this project takes it further, where you can actually, as a listener, ask questions, and then get information back, where it feels more like you’re really getting closer and closer to the story and the artist.”
As lyrics to the song flash across the screen, the song pauses, and Harris’ “Shadow” character prompts an interaction between the listener and the voice. Over the course of the exchange, the voice tells the story of the song and details behind-the-scenes moments of the production. Visitors to the site can also access additional audio clips and images.
Listen led the creative development and production of the project, collaborating with its sister agency, A_DA, to produce the digital experience. The idea behind the “Adrenaline” project grew out of Listen’s interest in bringing more interactivity to podcasts. “We’ve been thinking a lot about the future of podcasts, audio storytelling, and audio experiences more broadly,” said Paul Amitai, executive strategy director at Listen. “We’re exploring different ways to incorporate interactivity and real-time response into audio content that’s determined by where, when, and how listeners are engaging,” he said.
Brands, in particular, have an opportunity to differentiate themselves using voice-enabled experiences, said the team. “With the rapid growth of voice technology — smart speakers and voice assistants across mobile, in-home, and in-vehicle — voice is becoming more important and prevalent in terms of how people engage with brands,” said Amitai. “Building a sonic brand strategy for voice that’s distinct and consistently deployed can help brands stand out in an expanding and increasingly competitive audio-first domain.”