Alibaba leads a $290 million investment in Chinese startup ShengShu, marking a major shift in the artificial intelligence race. The shift highlights the limits of ‘large language models’ trained primarily on text. Instead, developers are starting to focus more on ‘world models’ built on videos and real-life physical scenarios.
ShengShu said the latest funding will support the development of a “general world model that uses AI to bridge two currently separate domains: the digital world of games and AI-generated video and the physical world of autonomous driving and robots.”
Zhu Jun, founder of ShengShu, added in a statement, “ShengShu believes that a general world model, built on multimodal data such as vision, audio, and touch, more naturally captures how the physical world works than large language models.” He also believes that AI systems trained on world models can more consistently predict real-world behavior.
This action highlights the growing recognition within the industry that, despite being very popular, LLMs have their weaknesses. While good at writing text and helping chatbots, LLMs cannot communicate in real-time or understand the physical world. This weakness becomes even more pronounced when AI is applied to robotics and other systems that need to understand the physical world around them.
It also reflects the broader trend within the AI industry toward developing applications that are not merely language-based but have many modalities and can engage in actions. Should they succeed, world models will mark the next stage of artificial intelligence, making AI omnipresent in the real world.
Also read: How Artificial Intelligence will Transform Human Learning?