AIOps, (for artificial intelligence for IT operations) is the application of artificial intelligence (AI) to enhance IT operations. Specifically, AIOps uses big data, analytics, and machine learning capabilities to do collect and aggregate the huge and ever-increasing volumes of operations data generated by multiple IT infrastructure components, applications, and performance-monitoring tools. It also enables the intelligent sift of ‘signals’ out of the ‘noise’ to identify significant events and patterns related to system performance and availability issues. The combined capabilities of ML and analytics help diagnose root causes and report them to IT for rapid response and remediation—or, in some cases, automatically resolve these issues without human intervention.
By replacing multiple separate, manual IT operations tools with a single, intelligent, and automated IT operations platform, AIOps enables IT operations teams to respond more quickly—even proactively—to slowdowns and outages, with a lot less effort.
It bridges the gap between an increasingly diverse, dynamic, and difficult-to-monitor IT landscape, on the one hand, and user expectations for little or no interruption in application performance and availability, on the other. Most experts consider AIOps to be the future of IT operations management.
However, the world of AIOps presents a duality. On the one hand, it’s an emerging technology that for the first time mashes up operations and AI. On the other, many of the solutions in this space are traditional tools that have been updated to leverage AI. This mix of old and new, traditional players and startups, makes this space particularly interesting. According to a report, here are key highlights of the prevailing landscape of AIOps.
The AIOps tools in the market today are on a spectrum with regard to the use of AI. While some make use of knowledge engines systemically in the monitoring and management of cloud and non-cloud systems, most tools leverage AI as an afterthought, not driving much of the functionality of the tool.
Enterprises are typically adopting AIOps as an upgrade to existing ops tools, and are remaining brand loyal. This means that the upstarts in the AIOps space will find it difficult to break into a market where the established players are in essence selling with the same basic message: AI integrated with management and monitoring that you trust. Considering this, we may see a consolidation next year as the market focuses on a handful of players, down from the two dozen or so relevant players today.
There seem to be two directions in AIOps: self-healing and not self-healing. Some AIOps systems are able to heal issues with systems that are managed and/or monitored. This means that if the tool finds an issue, a process is launched to attempt to correct the problem, for instance restarting a server or a network hub. Other solutions are more passive, alerting users about an issue, but without taking automated corrective action. The trend is toward active, or self-healing, AIOps tools.
These tools are all about the data. They store data as they monitor systems and can determine issues that need immediate attention, such as a down storage server. Or, they can deeply analyze historical data to determine trends that may portend a failure or other potential issue. The lifeblood of any AI system is the data needed to train the AI model, and this is the opportunity presented to AIOps tools. Monitored cloud or on-premises systems spin-off gigabytes of data each week, and that data can be fed into analytic systems augmented by AI.
Enterprises that wish to leverage these tools should be careful to understand their capabilities, and should also test the tools across both enterprise cloud and non-cloud platforms. There have been compatibility issues reported, most discovered after deployment.
Many of these tools are moving to an “on-demand” model, meaning that they will offer cloud-based services. This is an opportunity for those that have, or will have, the majority of their systems on public clouds. However, it may not be a good model for those that still have the majority of systems on-premises.