The audio entertainment market is undergoing a huge shift, with the integration of artificial intelligence (AI) and advanced language technology. In a recent episode of the Analytics Insight podcast, host Priya Dialani spoke with Prateek Dixit, Co-Founder and CTO of Pocket Entertainment, the parent company behind Pocket FM. The conversation explored how artificial intelligence is transforming the way audio content is created, translated, and consumed across the globe. With a fast-growing user base and cutting-edge language technology, Pocket FM is setting new standards for AI in entertainment.
Founded in 2018, Pocket FM began with a clear mission—to empower writers to publish high-quality audio content without traditional gatekeepers. Since then, it has grown into a global platform serving over 200 million listeners across more than 20+ countries. The platform currently supports 10 languages, including Hindi, English, Tamil, Telugu, Spanish, and German, and streams more than 100 billion minutes of audio annually.
A turning point came with the show ‘Ek Ladki Ko Dekha Hai’, originally in Hindi but adapted across multiple languages with widespread success. According to Prateek, this reinforced a core belief: powerful storytelling transcends language and format.
As CTO, Prateek leads Pocket FM’s technological direction and infrastructure, but his role has evolved to shape the company’s AI-first strategy. Central to this mission is developing tools that help creators deliver top-tier content quickly and easily. His primary goal is to empower writers through technology—giving them tools that streamline content creation without compromising quality. Whether in audio, novel, or comic form, the script remains the beating heart of every format.
Internally, the company is building creator-focused AI tools, some of which are slated for public release. These innovations are designed to enhance—not replace—human creativity.
With India alone comprising 22 official languages and countless regional dialects, the scale of linguistic diversity is immense. Pocket FM addresses this challenge with an in-house model called Atlas. This AI model can convert content from any base language to another within a single day, drastically reducing manual adaptation time.
This model serves two purposes: creators can expand their reach, and users can enjoy content in their native language. Additionally, a partnership with ElevenLabs has enabled voice synthesis in over 15 languages, with plans to support 30 to 40 languages in the coming years.
The result: a wider reach for creators and more inclusive storytelling for global audiences.
One of the biggest challenges in scaling an audio platform globally is access to rich, high-quality data—especially for new languages. Pocket FM leverages its vast library of user-generated content, including over 10,000 hours per language and more than 100 million content edits.
This gives the company a significant edge, as such refined datasets are not publicly available. While compute and infrastructure limitations remain, Prateek believes these will be commoditized in the near future. The long-term advantage will lie in owning unique, high-quality data to fuel AI models.
The maturity of AI tools like script generation and voice synthesis varies by language. For widely spoken languages with ample training data—like English, Hindi, and German—Pocket FM achieves up to 95% human-like quality. In contrast, long-tail languages, often underserved, see reduced accuracy levels around 70–75%, requiring human quality checks.
To close this gap, the company is working on synthetic data generation. Within two years, they aim to build training data for underserved languages using synthetic techniques, improving AI accuracy and scaling content production across the board.
AI innovation also raises concerns around content authenticity and ethical use. Pocket FM takes a dual-layer approach to address this:
Moderation Layer: AI systems scan all uploads for originality. Any script or voice with over 90% similarity to existing content is flagged and blocked from publishing.
Alignment Layer: Human feedback is integrated into model training to ensure unique and emotionally resonant outputs, ensuring over 99% uniqueness.
Importantly, creators maintain control over what is published. AI tools act as creative assistants, not replacements, ensuring that the soul of storytelling remains intact.
Looking ahead, Prateek sees an AI-powered future filled with interactive and hyper-personalized audio formats. Stories like ‘Ek Ladki Ko Dekha Hai’ could be tailored for millions of users, offering unique choices and outcomes. Content will no longer be mass-distributed but custom-crafted for individual listeners, changing how people consume and connect with stories.
What lies ahead isn’t just more languages or better translations—it’s entirely new forms of storytelling built on AI infrastructure and driven by human creativity.
As the episode concluded, it became clear that Pocket FM isn’t simply using AI to reach more users—it’s building a future where every user, no matter their language or location, can access stories that feel personal and emotionally rich. By combining scalable AI with ethical frameworks and creator-centric tools, Pocket FM is crafting a storytelling experience that is both deeply human and remarkably advanced.
In Prateek Dixit's words, the mission remains steady—amplifying creativity, not automating it. And with the AI tools they’re building, that mission is just getting started.