

In real-world conditions, software is defined not just by its features, but by how it behaves under pressure. Concurrency, partial failures, and unpredictable workloads reveal patterns that remain invisible in controlled conditions. Iaroslav Molochkov, a Senior Software Developer with extensive experience at SberTech, EPAM, and JetBrains, works on precisely these challenges. His work spans coordinating complex migrations, optimizing performance under production load, and designing mechanisms that ensure correctness in distributed systems. In this article, he draws on his project experience to explain how he approaches some of the most common — and most difficult — issues at scale.
Iaroslav Molochkov’s path into engineering began with a strong foundation in mathematics and physics, which later evolved into a focus on software development. One of the key stages in his career was his role as a Principal IT Engineer at SberTech. There, he worked on internal infrastructure projects that required not only strong technical execution but also well-developed soft skills.
A defining initiative during this period was leading the release process for Apache Ignite, an open-source in-memory data grid for high-load environments. Iaroslav managed the full lifecycle — from scope definition to meeting strict performance and stability benchmarks. He also coordinated closely with the developer community to align expectations, resolve disagreements, and navigate validation and voting stages. The result was a high-quality release that delivered measurable improvements for a platform used by hundreds of organizations worldwide.
Later, at EPAM, Molochkov’s focus shifted further toward system architecture. Working with a major European DIY retailer, he made significant contributions to transforming a legacy monolithic system into a more scalable architecture. As part of this effort, he designed and delivered core services supporting critical business operations under high load. His strong focus on reliability, efficiency, and long-term maintainability allowed the platform to grow without a proportional increase in complexity or infrastructure costs.
Today, Iaroslav Molochkov is a Senior Software Developer at JetBrains, a global software company whose products are used by millions of developers and organizations worldwide. His work focuses on cloud integrations and helps keep these tools fast, stable, and reliable even under demanding enterprise conditions.
Molochkov is also actively involved in evaluating technological innovation at the industry level. In particular, he was selected as a judge for the Business Intelligence Group’s Stratus Awards for Cloud Computing in 2025 and the CES Innovation Awards in 2026 — both highly selective, internationally recognized programs that bring together experts to assess cutting-edge technologies.
At JetBrains, one of the projects that helped further establish Molochkov’s expertise was the migration of mission-critical platform components to a new version of a shared API, where Iaroslav played a leading role. Two versions were incompatible, while the components were essential for both internal operations and external users. The change was so extensive that any misstep could have resulted in company-wide disruption or delivery delays.
Molochkov led the most technically demanding phase of the migration, established the implementation approach later adopted by other engineers, and delivered the change on schedule without incidents. “Migrating a codebase of this scale means keeping multiple interrelated components and their contracts in sync,” Iaroslav explains. “Given the scope and importance of these components, the work required a highly careful approach. Planning — both for the migration itself and for what would follow it — was crucial.”
Based on this experience, he outlines a set of practical recommendations for teams working on similar projects.
Before the migration, Molochkov advises identifying all critical use cases, dependent systems, and user flows most likely to be affected so regressions in key areas can be detected early. It is also essential to define clear baselines and success criteria in advance, along with a rollback plan — including specific behaviors, metrics, and failure conditions that would trigger a rollback.
During the migration, Molochkov emphasizes preserving public contracts wherever possible. Public interfaces should change only when truly necessary, and even then teams should favor backward-compatible approaches. Internal changes, by contrast, can be introduced behind the scenes as long as they do not disrupt consumers. If incompatibilities or defects emerge, each fix should be evaluated not only for correctness but also for potential side effects, such as increased latency or throttling. Iaroslav also recommends carrying out migrations incrementally to contain risk and validate each stage before a broader rollout.
After the migration, step-by-step validation should follow. Molochkov suggests starting with isolated environments, then moving to internal deployment, and gradually exposing the system to production traffic. Throughout this process, close monitoring of key metrics — including the so-called “golden signals” (latency, traffic, errors, and saturation) — together with logs helps ensure that any issues are detected early and addressed before they escalate.
One area Molochkov is currently working on is distributed systems — a domain he describes as fundamentally defined by trade-offs and constant adjustment. Some systems prioritize high availability, while others require stronger consistency guarantees. Each case demands a different balance.
A simple example is Kafka’s producer configuration. Often you can improve throughput by tuning parameters such as linger time and batch size to increase batching before transmission. This reduces the number of network round-trips and improves efficiency, but it can also increase latency, as records may wait longer on the producer before they are sent. The optimal configuration ultimately depends on workload characteristics and project requirements.
Beyond performance, one of the most challenging aspects of distributed systems is ensuring correctness under concurrency and partial failure. Issues rarely arise from a single point of disruption. Instead, they emerge at the boundaries between components, where timing, race conditions, and incomplete operations intersect.
Caching is one of the clearest examples of this dynamic. While it improves performance by reducing load on underlying systems, it also introduces additional copies of state that may temporarily diverge. Iaroslav advises acknowledging this limitation early and designing accordingly. Rather than aiming for perfect synchronization — which is often too costly — it is more effective to limit the impact of inconsistencies.
This can be achieved through robust invalidation or refresh strategies, as well as versioning or fencing mechanisms to prevent stale writes — situations where outdated data overwrites newer, correct ones. TTL (time-to-live) can serve as a safety measure by limiting how long stale data survives, but it should not be treated as the primary correctness mechanism. Finally, choosing the appropriate cache pattern for each access path is equally important.
A similar level of nuance applies to distributed coordination, which helps coordinate concurrent work, limit duplicate execution, and protect shared resources under contention. In one project, Iaroslav implemented a Redis-based coordination mechanism to coalesce duplicate parallel calls and reduce unnecessary load on downstream AWS services. The solution relied on per-key leases with unique ownership tokens and TTL to coordinate identical requests efficiently.
Molochkov also had to address the previously mentioned stale writers. To mitigate this risk, he combined the Redis update and lease release into a single atomic Lua script. The script verified lease ownership before applying the update and releasing the lease, ensuring that only the current owner could finalize the Redis state change. However, where external side effects are involved, the target system must still enforce conditional updates or fencing-style token checks to prevent stale workers from overwriting newer results.
Despite their benefits, Molochkov notes that such approaches can easily become an anti-pattern when used as a blanket solution to concurrency issues. “They shouldn't replace sound data modeling, idempotent operations, conditional updates, or ownership partitioning,” he says.
Ultimately, these challenges cannot be fully eliminated. You can only manage them through careful design, explicit trade-offs, and a deep understanding of how systems behave under real-world conditions — an approach Iaroslav both applies in his work and encourages others to develop.