Artificial Intelligence

Reshaping the Training Infrastructure Behind Frontier AI

Written By : Arundhati Kumar

When OpenAI releases a new version of GPT, or when Anthropic ships an update to Claude, the headlines focus on benchmark scores and parameter counts. What rarely makes the news is the painstaking work that happens before training begins: the creation of data environments, reward signals, and evaluation frameworks that determine whether a billion-dollar model will actually do what users need.

This invisible infrastructure is where Sushant Mehta operates. As a Research Scientist at Surge AI, Mehta has been building critical infrastructure for model training, reinforcement learning, and state of the art evaluation systems that are now used by the very frontier models that dominate the industry.

"Everyone wants to talk about the models," Mehta observes. "But the models are downstream of everything else. The real question is: what data shaped them? What reward signals taught them to behave? That's where the leverage actually sits."

Building Research Capability From Zero

Surge AI's trajectory tells a story about how the economics of AI development have shifted. Founded in 2020 and bootstrapped, the company surpassed $1 billion in annual revenue by 2024, serving AI frontier labs. Time Magazine recognized Surge on its TIME100 AI list; Forbes and Inc. profiled its meteoric rise.

Mehta joined Surge to design and implement the first critical infrastructure for model training and evaluation from scratch. He has used his research expertise to serve as a peer reviewer for ECIS 2026, one of the leading conferences on information systems. Mehta also served as a peer reviewer for the 2nd workshop on Human-Centered Recommender Systems at the prestigious 2026 ACM Web Conference.

A New Methodology for Instruction Following Capabilities

An example of a technical challenge that Mehta confronted is one that frustrates researchers across the industry: how do you evaluate whether a language model actually follows instructions? Most existing setups provide binary pass/fail judgments, but real-world instruction following is far more nuanced. A model might partially complete a task, handle some constraints but not others, or succeed in single-turn exchanges while failing across longer conversations.

Mehta's response was using RLVR (Reinforcement Learning from Verifiable Rewards) using nuanced rubrics. The methodology uses expert-written rubrics as verifiable reward signals for reinforcement learning, providing much more granular feedback than binary benchmarks. Rather than asking whether a model passed or failed, RLVR evaluates performance across multiple dimensions: did the model follow the system prompt? Did it maintain context across turns? Did it handle edge cases gracefully?

"Binary evaluations create a ceiling," Mehta argues. "Once you've saturated them, you have no signal about what to improve next. Rubrics give you a roadmap. They tell you exactly where the model is failing and why."

The Infrastructure Nobody Sees

Developing a methodology is one thing; deploying it at scale is another. Mehta led the work to build infrastructure capable of running large-scale reinforcement learning experiments for RLVR. Unlike traditional RLHF, which relies on human preference judgments that can be inconsistent and expensive, RLVR uses expert-crafted rubrics to provide granular, interpretable scoring. The approach addresses one of the most pressing challenges in AI development: how to systematically improve model performance on complex tasks where simple pass/fail judgments are insufficient.

"The question everyone asks is how to make RLHF more efficient," Mehta notes. Rubrics-based approaches can substantially improve quality while requiring lower training data volume. That's not just incremental, that's transformative."

Strategic Partner for Frontier Data and Evals

The capabilities Mehta built help make the company a strategic partner and bring the company into conversations about model architecture and training strategies, which has significant implications for the company and industry at large.

But perhaps even more significant is Mehta’s plans for enterprise AI.

"Enterprise AI is a different problem than research AI," Mehta explains. "A model that works ninety-five percent of the time isn't good enough when customers depend on it for purchasing decisions. Enterprises needed evaluation criteria that mapped to business outcomes, not academic benchmarks. That's what we build."

The Invisible Hand in AI Development

The broader implications of Mehta's contributions extend far beyond their immediate context. As frontier laboratories integrate the training data produced through Surge AI's methodologies, these enhancements propagate across the entire ecosystem of applications built upon foundation models. These enhanced abilities ultimately manifest in education, medicine, technology, enterprise, creative tools, and innumerable other applications spanning virtually every facet of our economy.

The AI evaluation and training market is projected to grow from $3.59 billion in 2025 to $17.04 billion by 2032, with a CAGR of 24.9%. This demonstrates the value of the differentiated services Mehta has pioneered. His contributions position the company to capture significant share as the market expands over the next decade.

Yet for all this impact, the work remains largely invisible to the public. The models get the attention; the infrastructure that shapes them does not.

"I think about it like civil engineering," Mehta reflects. "Nobody notices the water treatment plant or the electrical grid until something breaks. Data infrastructure is the same. If we do our job well, the models just work, and no one thinks about why."

What Comes Next

Looking ahead, Mehta sees the distinction between data companies and model companies continuing to blur. Reinforcement Learning is becoming central to AI development, not just for initial training but for ongoing alignment and capability improvement. The organizations that can provide verifiable reward signals will become indispensable thought and strategic partners.

Enterprise deployment represents another growth vector. As Fortune 500 companies move from AI experimentation to production systems, they need evaluation frameworks that translate academic rigor into actionable deployment metrics. Mehta's work could generate substantial value while shaping how large organizations integrate AI into critical business processes.

For now, Mehta remains focused on the fundamentals: building infrastructure that makes models more reliable, more efficient, and more aligned with human intent. It can sometimes be unglamorous work. But it may be the work that matters most.

"The next generation of models won't be defined just by their size," Mehta concludes. "They'll be defined by how well they were trained, by the quality of the data, the precision of the reward signals, the rigor of the evaluation. That's where the real competition is happening."

In an industry obsessed with the models themselves, Sushant Mehta is building the infrastructure that determines what those models become. It is this work that shapes AI systems touching billions of users.

Solana’s $50 Warning: Is the Crypto Crash Coming?

ZKP Aims to Unleash a New Era of Privacy as zk-SNARKs Break the Blockchain Transparency Barrier

Why Traders are Watching These Dogecoin Price Points

XRP & LINK Look Stable, but Zero Knowledge Proof Is Emerging as a 2000x Opportunity for Early Buyers

Crypto Prices Today: Bitcoin Stable at $90K, Ethereum Holds $3.1K, Remittix Trading Activity Surges 10x