Meet Your New Co-Host: How AI Will Transform Podcasting
Imagine a feature so versatile it can empower businesses, elevate side hustles, and spark creativity for hobbyists — all while making everyday tasks smoother and smarter. I’ve been diving into this game-changing capability, and the potential is nothing short of remarkable. In this post, I’ll reveal one of its most groundbreaking applications and show how it’s set to redefine what’s possible in podcasting.
Context
OpenAI just recently released the “advanced voice mode” capability in ChatGPT as well as the realtime API, which allows for speech-to-speech conversations. This was also released on Microsoft Azure through AI Studio with the GPT-4o-Realtime-Preview model.
Let’s dive into why this new capability is such a breakthrough in AI technology. Previously, to achieve speech-to-speech like capability, one would need to stitch together multiple steps. First, you would need to transcribe the spoken audio to text using a speech-to-text service to. Then run that text through an analysis model to generate a response. Finally, you’d convert the model’s response back to speech using a text-to-speech service. This multi-step process is more tedious to implement and can lead to decreased quality, such as longer latency and a loss of emotion.
With the new realtime API, everything flows smoothly and intuitively. It allows you to stream audio inputs and outputs directly, enabling more natural conversational experiences, and all in ONE API call. You can read more about the announcement and features here.
How AI Can Be the Ultimate Co-Host for Your Podcast
Now, let me get down to brass tacks. I’m not only an AI enthusiast and practitioner, but I’m also a hardcore professional wrestling fan. As a kid, (and maybe some as an adult) I’ve role-played wrestlers cutting promos (monologues to promote themselves) or commentators calling wrestling matches. I also listen to several wrestling podcasts, such as Something to Wrestle with Bruce Prichard and Conrad Thompson (a must listen for any wrestling fan!) As I was listening to their podcast, I was thinking of how cool it would be to have AI be the co-host of a podcast, leveraging the new advanced voice capability.
You can evaluate this on Microsoft Azure with the Real-Time audio playground in Azure AI Studio, however for this test, I used OpenAI’s ChatGPT app on my iPhone with gpt-4o. I was really impressed with the results that I have shared below, as well as my demo. Note: at the end of the demo, I was amazed with the questions “Becky” created on her own as I did not guide it on that, and took those on the fly!
Results & Optimization
Here are some insights and best practices that I learned to drive positive results as an AI co-host for a podcast…
The conversational nature of ChatGPT was phenomenal. In my prompt prior to starting the podcast, I provided context to ChatGPT to ensure it provided more optimal results (this was not included in the demo above). Below are some things I implemented:
- Give it a name — I gave my ChatGPT a name to sound more realistic. I called it “Becky”, paying homage to one of the most popular and successful women’s wrestlers of all time — Becky Lynch.
- Set the stage — I provided very specific context to Becky of what the goal was (simulating a live podcast) and what I needed Becky to do. For example, I specified the topic of wrestling in the late 1980s, and that we’d be referencing Hulk Hogan and Macho Man’s rise to prominence and downfall as a team.
- Provide specifics on speaking — I told Becky to sound realistic and not provide perfectly spoken responses, to be casual, to show emotion, and not provide long answers. (You can also select from one of the preset voices that you would like to use. I selected “Sol”. There are currently 9 different options.)
- Have your Co-host ask you questions — I told Becky to ask me questions as a way to have a natural conversation on a specific topic. This was to help ensure that the podcast does not turn into a one way conversation of only me asking Becky questions.
Model Knowledge was great — this was very impressive. The level of detail that Becky was able to provide with hints or inferences were spot on. Becky used terms like “heel” (which means “bad guy” in wrestling speak). I did not need to be very explicit in guiding Becky during the actual podcast and that made the result sound alot more natural.
Creativity — when asking Becky about where the Wrestlemania 5 main event match ranks in Macho Man’s career all time, Becky weaved in a historical reference to his Wrestlemania 3 match, which is also regarded as one of the greatest matches of all time. This proved Becky’s comments to be very intelligent and relevant.
Handling interruptions — this capability of allowing me (user) to interrupt Becky (ChatGPT) adds alot to the reality of how podcasting plays out. I was able to interrupt Becky mid sentence, and Becky would adapt to whatever direction I’ve gone in.
Multi-person Podcast — In my demo, it was just myself and Becky, however you can do this with multiple humans along with Becky. This would be even more engaging where Becky would listen in and naturally contribute to the conversation.
Things to be mindful of:
Pausing — There was a slight pause (~1–2 sec) between when I finished a sentence and when Becky spoke. Its very reasonable as I did this from my iPhone. I also did some testing through Azure AI Studio and the response times were similar but can be made faster with the silence duration parameter. Subsequent tests I ran were also faster. There are many factors that I will not get into here that can affect latency but wanted to share my results at the time of my demonstration. This will vary.
Longer answers — Some answers were still a bit long despite me specifying this as context in the prompt. I’ll need to play with this a little more to get the right depth of response for my use case.
Repetitive phrases — Becky loved to say “What do you think, Marc?” quite a bit. This can be tweaked with parameter tuning and additional prompt techniques, so totally solvable.
Optimize for your Use case — in my example and the first time I tested this, I found that Becky would give away the outcome of the match in the first piece of feedback provided. I added some instructions in the prompt to tell Becky not to give away the outcome of the matches right away. I instructed Becky to have atleast one back and forth chat before discussing the outcome, unless I specified otherwise. It worked well.
Conclusion
We’ve only begun to explore the potential of the advanced voice mode capability. Having AI as a podcast co-host, leveraging advanced voice mode or the realtime API, opens many new possibilities. It is cutting edge technology. Imagine a co-host that could appeal to all audiences, discuss nearly any topic, with a plethora of knowledge, and interact in a natural conversational style — in real time. It will transform the podcasting game with infinite captivating, interactive and entertaining experiences for years to come.