Enterprise Cloud Environments Now Move Faster Than the People Managing Them — And That Gap Is Costing Millions

Jury member of the NextGen Hackathon 2025, Matvii Horskyi, helped reduce the operational cost of running enterprise-scale database clusters on a platform managing thousands of clusters. He explains that the fastest path to cloud savings is not a better vendor deal; it is eliminating the human decisions that make overspending structural.
Matvii Horskyi
Written By:
Arundhati Kumar
Published on
Updated on

The Uptime Institute's 2025 Annual Outage Analysis found that 85% of infrastructure failures were caused by human error. Now factor in that today's AI-driven stacks can cascade across dozens of interdependent services in milliseconds. The math is uncomfortable: the systems got faster. The humans managing them didn't. 

Matvii Horskyi, a 23-year-old senior backend and distributed systems engineer, has spent more than five years building systems designed to close exactly that gap, the distance between how fast modern cloud infrastructure moves and how slowly humans can manage it. At Qbox, he engineered core cluster automation systems that became foundational to the platform's operational reliability and played a key role in building infrastructure that became part of the successful acquisition by Instaclustr in January 2022. At NetApp, he developed production-grade Kubernetes Operators designed to automate end-to-end lifecycle management for distributed data systems in enterprise cloud environments. The system was built to support multi-million-dollar infrastructure used by organizations such as Atlassian, IBM, and DoorDash through Instaclustr’s managed data platform. His open-source contributions, built on the same principles, are now running in production environments globally. His open-source contributions, built on the same principles, are now running in production environments globally.

Last December, Horskyi served as a jury member at the NextGen Hackathon 2025, held at Université Côte d'Azur in Sophia Antipolis, France. He was one of 21 technology experts evaluating distributed systems projects from international teams. The panel assessed each project on three criteria: technical architecture, scalability, and real-world applicability. His assessment of where most organizations stand today is straightforward. The fastest path to cloud savings is not a better vendor deal; it’s eliminating the human decisions that make overspending structural.

When the Automation Has to Be Trustworthy, Not Just Fast

When Horskyi joined Qbox, a managed Elasticsearch platform serving enterprise customers including DoorDash, CBRE, and Yahoo, the company was building its infrastructure from scratch. He designed and implemented the core cluster management layer: the software responsible for every operational event in a cluster's lifecycle, from creation and configuration updates to dynamic CPU and memory scaling, snapshot management, and fault recovery.

The critical technical property he built around was idempotency, ensuring every operation produced identical results regardless of how many times it was retried, even when interrupted mid-execution by network failures or partial state transitions. Without it, automated systems require constant human supervision to catch edge cases, negating much of the value of automation. With it, thousands of cluster lifecycle operations can run concurrently, safely, and without human intervention.

"Think about upgrading a production Elasticsearch cluster serving millions of search queries," Horskyi says. "You cannot take it offline. You need rolling upgrades, careful node shutdown coordination, replica management, and continuous health verification. Doing this manually across thousands of clusters is where mistakes happen."

The systems Horskyi built replaced manual, error-prone operations with fully automated, idempotent workflows, eliminating an entire class of human-generated errors from routine cluster operations. The impact was structural rather than incremental: the platform gained the ability to safely run multiple cluster lifecycle operations concurrently, support enterprise-scale workloads with predictable performance, and maintain operational stability under continuous load across AWS, Azure, and GCP.

"The fundamental problem is not that cloud infrastructure is expensive," Horskyi says. "It is that manual management creates compound inefficiency; you are paying for the infrastructure, paying for the people managing it, and paying again when human errors cause outages."

Horskyi also implemented the billing-critical backend logic at Qbox. The system was designed to automatically capture dynamic scaling events at any hour and reflect them directly in customer billing, eliminating the need for manual reconciliation. With thousands of clusters scaling dynamically across the platform, manual tracking was not a viable option. The infrastructure itself had to become the authoritative source of billing truth. That architecture proved to be more than an operational improvement. It contributed directly to Qbox's successful acquisition by Instaclustr in January 2022, a signal that automated, self-reconciling infrastructure carries measurable business value beyond cost savings alone.

The Same Problem, at a Larger Scale

At NetApp, a NASDAQ-listed global leader in enterprise data management, Horskyi applied the same engineering principles across a broader set of systems. He developed production-grade Kubernetes Operators that automate the full lifecycle management of PostgreSQL, Kafka, Cassandra, Redis, and ClickHouse across environments, that was designed and implemented to serve enterprise customers, including Atlassian, IBM, DoorDash, and PubNub.

Kubernetes Operators work by encoding expert operational knowledge, how to deploy a system, scale it under load, and recover it after failure,  into automated controllers that run continuously. Rather than waiting for human intervention, these controllers constantly monitor and reconcile the gap between how a system should behave and how it is actually behaving, in real time.

The challenge Horskyi was solving goes beyond automation. Enterprises operating across multiple cloud providers face a fragmentation problem: each vendor demands its own tooling, expertise, and operational playbook. That expertise is expensive to hire and nearly impossible to retain at scale. The multi-cloud abstraction layer he built addressed this directly, creating a unified control plane that understands the common ground among cloud providers while handling their unique edge cases through concurrency control and safe parallel execution.

The scale of what these operators managed is significant. Based on publicly available information about Instaclustr's enterprise customer base, the cloud environments under this infrastructure represent anywhere from multi-million to tens-of-millions of dollars in managed infrastructure value.

The Real Cloud Cost Problem

Enterprise cloud cost conversations happen in finance meetings, not engineering reviews. They focus on what is being spent, not on why spending is higher than it should be. That distinction matters more than most organizations realize. The hidden cost of cloud infrastructure is not the server bill, but the raw cost of renting compute power, storage, and processing from cloud providers. It is the operational tax that accumulates every time a human decides that a well-engineered system should be made automatically.

Horskyi's trajectory illustrates that the cost pressure will only intensify. As AI workloads grow, cloud vendors are raising prices to match. Organizations that still rely on manual oversight will feel this most. They will pay not just for compute, but for every slow human decision, every delayed response, and every error made under pressure. The engineers who saw this coming built systems to eliminate that exposure. They were not just solving today's problem.  They are building the foundation for the next decade of cloud infrastructure.

Most infrastructure leaders watched their cloud bills grow last year. In 2026, those bills are projected to grow again. Horskyi's work offers a concrete answer, but only for those willing to ask the right question. The question is not which cloud provider offers the best discount. The question is whether the systems managing your infrastructure can operate without human intervention. Can they scale, recover, and automatically reconcile billing, or does every critical event still depend on a person who can make a mistake at 2 a.m.? That is where the real cost lives. The savings are not found by spending less. They are found by engineering systems that cannot afford to incur more cost.

logo
Analytics Insight: Top Tech & Crypto Publication | Latest AI, Tech, Crypto News
www.analyticsinsight.net