VST reveals the process behind the wildly popular AI talking baby video

VST reveals the process behind the wildly popular AI talking baby video

AI talking babies are all popular on TikTok. These videos feature adorable baby faces discussing adult topics and telling funny jokes, but how are they made? And why are they so addictive? VST analysed popular examples to explore the AI tools and production processes behind them.

The audio clips of Theo Von, a well-known American comedian, frequently appear in AI Baby podcast videos. There are two videos on the internet with 820,000 and 1.16 million likes, respectively, both of which use original audio clips from Theo Von's podcast programme.

Theo Von

 

Theo Von's distinctive Southern American accent and slightly erratic and neurotic way of expressing his thoughts are highly recognisable and comically compelling in themselves. Add to that the fact that children sometimes unintentionally say things that are profound or bluntly honest, and when the AI baby repeats Theo Von's jokes with a straight face, some netizens say it sounds more reasonable than when Theo Von himself says them.

The reason is obvious: the content of AI baby podcasts is sourced from popular podcasts or comedy sketches, which already have built-in appeal and have been market-tested, offering high entertainment value and viral potential. Additionally, the original podcasts or comedians already have a large fan base, and reimagining them through the novel format of an AI talking baby effectively expands their reach beyond their existing audience. Discussions that might have been slightly obscure or niche are unexpectedly made more accessible through the AI babies' interpretations, attracting a broader audience. The stark contrast between the innocent, naive infant image and the worldly, mature speech and behaviour of adults is the core of what captures attention and generates humour. The more adult-oriented the original content, the more humorous and absurd the combination with the infant image becomes. Many netizens have also expressed that watching such videos is stress-relieving, as the infant image deconstructs serious adult topics in a light-hearted and playful manner.

In discussions about the specific AI tools used behind AI talking baby videos, an AI tool called Hedra is frequently mentioned. Based on some creators' sharing and VST's actual testing, the production process of AI talking baby videos can be roughly divided into three steps.

 

Step 1: Generate AI baby images

Mainstream AI image generation tools such as Midjourney, GPT-4o, Gemini, Stable Diffusion, Doubao, Ideogram, etc. can all be used to generate baby images. You can choose based on factors such as price, unique features, and advantages. The key is to guide the AI model to generate baby images that meet your expectations through specific descriptions (such as characters, clothing, scenes, and facial features).

 

Step 2: Voice acting for AI babies

Once you have created an AI baby character, the next step is voice acting. There are two main sources of audio, but the key is whether the voice content itself is attractive, dramatic, and has the potential to go viral.

1. Use existing audio clips

This is currently the most common source of content for AI talking baby videos. Creators search various channels for interesting, funny, and reusable audio material. Sources can include popular podcasts, comedy talk shows, dialogue from classic films or TV series, pop songs, and funny audio clips that have gone viral online. If the material comes from a video platform such as YouTube, tools such as Cobalt can be used to download and extract the audio.

2. Generate new AI voice

If creators have original text content or want their AI baby to produce specific sounds, they can use AI voice generation tools such as ElevenLabs and the Minimax Speech 02 model. ElevenLabs is known for its high-quality text-to-speech (TTS) and voice cloning capabilities, which can generate very natural and emotional voices.

Whether choosing existing audio or generating new speech, it is important to pay attention to copyright issues. Prioritise content that has been authorised or falls within the scope of fair use. Do not clone someone else's voice for commercial or other illegal purposes without authorisation.

Step 3: Make the AI baby ‘speak’

Finally, upload the AI Baby image and prepared audio files to Hedra Labs. You can choose the AI video model, video aspect ratio, and resolution (up to 720p), and also enter prompt words to describe the character's emotions and actions. The AI will analyse the character's facial features and, based on the audio's emotions, rhythm, and pronunciation, apply natural micro-expressions and dynamic changes to the face, achieving ‘audio-visual synchronisation’ with the character's lip movements.

 

Of course, the application prospects of related technologies extend far beyond this, with potential for widespread use in areas such as game characters and film and television animation characters. At VST, you can delve deeper into cutting-edge information about AI creative tools. VST is committed to building a one-stop service platform for TikTok entrepreneurs, offering a comprehensive range of services including information and news, professional consulting, skill training, AI solutions, entrepreneurial support, and business matching, to establish a robust commercial ecosystem that empowers entrepreneurs to grow efficiently and uncover more business opportunities. Additionally, VST is deeply involved in the research and development of AI creative tools. Its related technologies are not only applicable to translation and content creation but also show great potential in fields such as digital anchors and virtual assistants. Join VST to explore the endless possibilities enabled by AI technology.

RELATED ARTICLES