
The recent announcement of a partnership between innovative visual AI software company Lightricks and leading stock media provider Shutterstock opens the door to a new type of relationship which could have significant impact for both sides – and for the ecosystems in which they operate.
Lightricks is the first global partner to take advantage of Shutterstock’s new research license, using Shutterstock’s extensive video content libraries to train LTXV, the new Lightricks open-source generative video model. LTXV is available to developers to experiment with for free, and it’s also being integrated into Lightricks’ AI filmmaking web app, LTX Studio.
The deal brings important benefits to both companies. Lightricks gets to utilize Shutterstock’s enormous media archives as vital training data, while maintaining its commitment to respect creator rights. Meanwhile, Shutterstock successfully diversified its strategic business model to open up new revenue streams and stay relevant as AI becomes more embedded in creative work.
However, the new partnership has a significance that goes beyond the two companies involved. It could be consequential for both media and AI industries.
If you’ve been following AI news recently, you’ll have seen a rash of announcements about new video AI models and platforms. In the past few weeks alone, OpenAI released Sora, its new video generator tool; Google DeepMind announced Veo2, a new AI video model that’s available through VideoFX; and Meta shared early information about their upcoming video generator, Movie Gen.
While each of these AI video tools brings new capabilities and impressive results, they also all share the same problems that have dogged AI video since its inception. “Uncanny valley” still undermines authenticity; object permanence, coherence, and consistency are still highly patchy; and generation is very slow and resource-heavy. The longer the video clip, the more issues arise.
All of these challenges share the same solution: more and more training data. AI video companies need massive amounts of high-quality source material to train their models for greater accuracy and efficacy. But they’re struggling to access it, and the data they do use is often of dubious provenance.
Much of the data in publicly available training datasets has doubtful legal status, at the best of times. An audit run by a new group called the Data Provenance Initiative found that over 70% of the data was unlicensed, and more than half of the licenses that do exist aren’t accurate.
This research suggests that even AI developers who think they’re using legal data are exposed to lawsuits from artists and authors and ethical conundrums. Other AI enterprises aren’t trying as hard. Runway, for example, has been named and shamed for using thousands of YouTube videos, pirated films, and other unlicensed data sources to train its models. YouTube has been quite clear that any use of its videos to train AI models is a violation of its terms of use.
It’s evident that many companies have used illegally-gained data for training purposes, and the legal cases are starting to multiply. Stability AI, Midjourney and DeviantArt have all faced court cases from artists alleging copyright infringements over the use of their art. Google is in the midst of fighting a class-action lawsuit for having scraped social media posts to train its large language models (LLMs). Uproar over the Books3 data set led to its shutdown last year.
But it’s very difficult for AI companies to access legal data, and it’s only getting harder as competition for data rises. Some companies set up opt-out systems for creators to request that their work isn’t used for training, but that sets the burden on data owners to track down illegal use and file requests on a case by case basis.
The newly-formed Data Providers Alliance (DPA) is backing an opt-in system, which would give much stronger protection to creators. However, the volume of data that AI companies need means that accessing enough data would be a very expensive undertaking. In practice, smaller companies would be data-starved, and Big Tech’s dominance over smaller players in the video AI arena would be reinforced.
This is where the new partnership between Lightricks and Shutterstock becomes so important. It could cut through the thicket of barriers to affordable, accessible, and ethical training data, and build a new ecosystem for media providers.
Lightricks has taken advantage of Shutterstock’s research license, which is far less expensive than a heavy commercial license. The research licence offers Shutterstock’s extensive HD and 4K video library for testing and experimentation, based on a revenue-sharing model that directs 20% of the revenue back to creators.
Shutterstock handles creative consent, inviting contributors to opt out of having their content used for AI training, so Lightricks can rest easy knowing that it won’t be surprised by a copyright lawsuit.
This isn’t the first partnership between a video AI company and a media company. In September 2024, film studio Lionsgate announced a deal with applied AI company Runway. However, the focus of that deal is quite different. Runway will train a new AI model on Lionsgate’s proprietary data, using it to augment Lionsgate’s media output. It’s unclear whether other Runway users will benefit from the improved AI model.
Beyond ensuring ethical data use, Shutterstock’s research license strategy helps to keep the AI video market open and competitive for startups and entrepreneurs. Since costs are much lower than for a commercial license, the terms suggest that smaller researchers and innovators can experiment with AI models. This helps prevent the tech giants from maintaining a deadlock on AI innovation.
If the Lightricks-Shutterstock partnership is successful, we could see a lot of other AI video companies forging similar partnerships with media firms – and a lot more media companies happy to emulate Shutterstock to get a foot in the door of this fast-moving vertical that’s likely to transform their business market in the next few years.