AIOps tools are gaining popularity in 2019, indicating a shift toward a modern IT strategy that incorporates machine intelligence to better support digital transformation.
Wondering if your organization needs AIOps? We’ve written this guide to cover everything you need to know about this emerging space including guidance on selecting the right tool to meet the requirements of your business.
This guide will help you answer key questions throughout your research and selection process, including:
- What is the purpose of an AIOps solution?
- What are the main features of an AIOps platform?
- How can I use advanced analytics for day-to-day IT operations?
- What benefits could this technology bring to my business?
- How do AIOps tools differ from one another?
To keep things organized, we’ve divided this post into chapters. You can skip to a section by clicking on a link below.
Don’t have time to read the whole guide or want something quick to refresh your memory later? Download our free quick start guide on the business case for AIOps.
Quick Start Guide: The Business Case for AIOps
Introduction: The Origin of AIOps
Chapter One: Breaking Down the AIOps Acronym
Chapter Two: AI in the Enterprise
Chapter Three: The Business Case for AIOps
Chapter Four: AIOps Features vs. Platforms
Chapter Five: Getting Started
Conclusion: AIOps and the Future of IT
Artificial intelligence for IT operations or AIOps is a category of business software that uses analytics and machine learning to improve IT’s ability to aggregate, analyze, and act on big data.
In traditional IT, each silo (network, infrastructure, and application) uses different monitoring tools that generate alarms when something’s wrong. Since most of these tools are confined to the datasets of their specialization, they can’t tell the complete story of what’s happening in the environment. This makes it hard to isolate and fix the root cause of a problem, such as the failure of a business-critical service. The infrastructure tool says it’s the network’s fault, the network tool blames the application, and on and on. This kind of finger-pointing is unproductive and costly for organizations moving at digital speeds.
What if instead of silos, we created a unified information stream for greater context? Then, using machine intelligence, aggregated a focused feed that correlated data from all the silos—more data than a human could process—and analyzed that feed so that IT could easily understand the most important events happening across their environment and why?
That is the purpose of AIOps tools: discover and act on meaningful insights in the environment to help IT run more efficient operations, make better decisions, and support business productivity.
ITOA vs. AIOps
AIOps may be a new buzzword, but it evolved from an existing category of IT software: IT operations analytics (ITOA). Like AIOps platforms, ITOA tools aggregate data from different sources and apply big data analytics to extract insights.
AIOps expands on ITOA in three primary ways:
- Ingesting more kinds of data
- Processing real-time data in addition to historical data
- Introducing machine learning to help analyze growing data sets
Why Is Interest in AIOps Growing?
Interest in the term picked up in 2018 and is still rising, as you can see in this Google Trends chart, which displays search interest over time.
Figure 2: Google Trends chart showing global search interest for AIOps
However, if you compare an established industry term like ITSM, you can see there’s still a way to go before AIOps becomes mainstream.
Figure 3: Google Trends chart showing global search interest for AIOps (blue) and IT Service Management (red)
While we believe that Gartner’s research and hype from vendors have boosted its online presence, it’s helpful to understand some of the challenges IT departments face that make AIOps an attractive solution:
- Difficulty getting actionable insights out of monitoring tools
- Noisy and incomplete data
- Difficulty performing root cause analysis
- High volume of persistent and recurring issues
- High ticket volumes (especially for low-level issues like application problems, WiFi connectivity, slow login, and printing problems)
- Pressure from the business to do more without increasing staff
- Difficulty addressing issues proactively
- Productivity loss and decreased employee satisfaction due to poorly performing technology
- Loss of visibility into the environment as organizations adopt SaaS and cloud services
Many are skeptical of technologies labeled as AI. Artificial intelligence, after all, has been around since the 1950s and there have been waves of excitement and disappointment in it since then. But practical applications of AI are becoming more mainstream with technology like self-driving cars, face recognition, and digital assistants being used or encountered by millions of people around the world every day.
These examples are a tiny fraction of how AI is being used, but they show the diversity of technologies available today, which raises the question: What do companies mean when they say they offer AI?
To answer this question, it’s important to highlight the differences between AI and machine learning (ML). AI is a broad category of ways that machines can imitate human intelligence. Often treated as a synonym for AI, ML is a subset of AI that describes the algorithms used to train machines to identify and surface patterns in data. (Many resources on this topic offer similar definitions, but these envelope explainers from MIT Technology Review on AI and ML are fun if you want to know more.)
One example of a technology that uses multiple subfields of AI is the Google Assistant, which combines voice recognition, natural language processing, Google’s knowledge base, and ML to enable human-machine conversations and perform tasks.
Overwhelmingly, AIOps platform vendors refer to the technology underlying their software as ML. Vendors are using ML along with other advanced analytics to find patterns, detect anomalies, identify incidents of interest, and predict the likelihood of events.
Other aspects of AI may be incorporated into AIOps tools as well, such as natural language processing and sentiment analysis, but these technologies don’t directly relate to the core function of the platforms.
Quick Start Guide: The Business Case for AIOps
Is AIOps Only for IT Operations?
The short answer is no! Other groups that can benefit include the service desk, DevOps, InfoSec, and business leaders. We cover what some of these use cases look like in chapter three.
Organizations are deploying AI across departments to help facilitate digital transformation. According to IDC, global spending on AI systems is expected to climb to $77.6 billion in 2022—more than triple their spending forecast for 2018.
A “2018 State of AI in the Enterprise” report from Deloitte found that 59% of executives across all lines of business report adopting software with baked-in AI. Common implementations include CRM and ERP software.
Deloitte’s survey also offers some findings on how business leaders are thinking about AI in the enterprise:
- 42% of respondents listed optimizing internal operations as a top benefit they achieved from their AI implementation
- 39% ranked data issues (e.g. data privacy, accessing and integrating data) in their top three AI challenges
- 43% listed “making the wrong strategic decisions based on AI” as one of their primary concerns
AI Adoption in IT Operations
AIOps is a small piece of the business AI trend. Despite being the backbone of organizations’ technology, IT is lagging other departments that have not only implemented AI but have started to reap its benefits.
IT leaders are showing interest in AI, but discussions are in early stages. Gartner says that “most infrastructure and operations leaders are in the Zone of Learning today, assessing the benefits of deploying technologies like chatbots, AIOps and virtual support agents in areas such as IT service management (ITSM) and IT infrastructure and application monitoring…” (research available to Gartner subscribers).
In a survey of IT leaders, Gartner found that 18% report currently using AI/ML to analyze big data with another 42% planning to implement this by the end of 2019, while 41% say they have no plans to use AI/ML within the next two years (see Gartner graphic below).
AI’s Impact on the Workforce
Are robots coming for IT jobs? It’s hard to predict how AIOps tools will affect existing IT jobs and hiring, but it seems probable that some roles will be augmented, some lower-level roles may be refocused, some roles will require additional training, and some tasks may require hiring employees with new skills.
For example, robust implementations for the service desk may reduce the need to hire more Level 1 technicians due to lower ticket volumes for basic incidents. Instead of an end user calling IT support to have an issue resolved and the support agent suggesting a restart, the end user’s computer could automatically recognize a set of conditions that necessitates a restart and prompt the action.
However, not all problems can or should be solved automatically. Consider a situation where an issue is attributed to a bug within an application, which no number of machine restarts or preset actions could fix. These kinds of issues must be escalated to the appropriate level of support or engineering.
At the same time, organizations may need to hire staff for more sophisticated AIOps implementations. The more IT wants to discover and automate, the more need there will be for personnel qualified to assess data quality, train algorithms, interpret results, and identify scenarios worth automating.
A recent Ponemon survey of IT security professionals found mixed opinions on whether automation will reduce the number of positions in their field (with 35% believing hiring will decrease and 40% believing it will increase). However, the majority agree that AI will improve their organization’s ability to monitor threats and allow staff to focus on higher-priority tasks.
“Firefighting,” “keeping the lights on,” “battling zombie hordes...” whatever you call it, traditional IT is a defensive game, addressing incidents after they turn into problems.
Imagine a football team that took a defense-only approach. The best outcome they could hope for is a final score of 0–0 or for their opponent to accidentally score on themselves. Absurd, right? Yet this is essentially what IT is doing by operating reactively.
Reactive IT isn’t only stressful for those solving the problems, it also negatively affects end-user experience. Often, interactions between IT and end users are discussed as adversarial, not unlike a home team with disenchanted fans.
It’s time for IT teams to step up their offense. AIOps enables a strong IT offense by surfacing relevant insights faster and thinking several steps ahead, laying the groundwork for IT to act swiftly, prevent problems, and stop existing problems from spreading.
One area where proactive IT can have a huge impact is IT support. As little as an hour of downtime caused by a widespread IT problem can cost an enterprise over $100k. With AIOps and automated resolutions, organizations can shift their support cycle left to resolve problems at the endpoint, greatly reducing downtime and IT personnel involvement.
In addition to IT operations, AIOps can support the efforts of the service desk, InfoSec, and DevOps teams along with business leaders. Different AIOps tools will cater to these groups with the biggest distinction being what kinds of data the tool can ingest.
Top AIOps Use Cases
1. Event Correlation
Understanding how one event relates to another is key to troubleshooting problems. For instance, if a user is experiencing an issue, correlation will show IT whether an event like a software upgrade occurred around the same time that could be related.
Gartner predicts, “By 2020, 90% of traditional, domain-specific event correlation and analysis tools will fail to provide accurate monitoring and root cause analysis, leading to high costs, and low productivity from excessive false positives” (research available to Gartner subscribers).
AIOps builds on traditional approaches to ECA in the following ways:
- Reducing noise in data by eliminating false alarms and duplicate events
- Introducing new data sources for more thorough investigation
- Enabling real-time and predictive correlation
2. Anomaly Detection
One drawback of static alarms is that they are based on standard performance thresholds, which may be too sensitive (or not sensitive enough) for different environments. Say an alarm was configured to trigger when a system took longer than a minute to boot up but an organization supporting legacy POS machines knew that boot time was only an issue at two minutes. They could change the time limit on the alarm to make reporting more useful and reduce noise.
While fine-tuning alarms can get IT closer to obtaining useful insights, it’s a slow process that requires constant tweaking to stay accurate and is difficult to upkeep across hundreds of alarms. Alternatively, anomaly detection works by comparing current data to historical trends and notifying IT only when unusual behavior is observed. For example, an AIOps tool could report on when a browser extension is experiencing a significantly higher load time than normal on a given system. This could clue IT into whether an upgrade is necessary or if the extension poses a security risk.
3. Predictive Analytics
Like anomaly detection, predictive analytics uses historical data to forecast the likelihood of events. One application of predictive analytics is resource planning for virtual desktops. Analysis of historical consumption and correlated end-user experience trends can be used to anticipate upcoming resource demands.
4. Root Cause Analysis and Reactive Support
Traditional monitoring tools may only provide a piece of the puzzle when it comes to root cause analysis. By combining data from multiple sources, AIOps gives IT a better chance of pinpointing the true root cause of an issue. This will result in faster triaging (i.e. the issue is escalated to the appropriate level of support) and drive down mean time to repair (MTTR). AIOps tools that store historical data will also allow IT to investigate back in time to when the source of the problem occurred.
5. Proactive Support
Proactive support is the act of solving a problem before a user is impacted. In addition to accelerating root cause analysis, analysis of streaming data enables IT to become aware of incidents before they develop into problems. By resolving issues proactively, IT can deliver a better quality of service, prevent costly downtime, and boost end-user experience.
6. Help Desk Optimization
Related to proactive support, automated or assisted healing refers to scripted actions that auto-resolve problems within the environment. Examples of automation include disk cleanup, system and application restarts, and fixing corrupted WMI.
For the service desk, AIOps represents a new opportunity to extend the shift-left model to Level 0 where problems are resolved before entering the traditional support cycle. This approach deflects incidents, reduces ticket volumes, improves MTTR, and lowers support costs. It also frees up service desk staff to focus on higher-value projects and eliminates the need to deal with certain common, low-level incidents.
Benefits of AIOps
1. More Value from Big Data
Organizations have long recognized the valuable stories contained in the data they generate every day and have adopted tools to aggregate and analyze that data. However, finding actionable information from big data has been challenging due to its variety, volume, and velocity. By applying ML, IT departments can learn more from the data they generate. While the need for tools that monitor different parts of the stack won’t go away, AIOps can turn what was once noise into meaningful insights for better decision-making.
2. Smoother Operations
AIOps can help companies avoid costly downtime and improve MTTD/MTTR through faster root cause analysis, proactive support, automated actions, and predictive analytics.
3. Cost Reduction
By shifting left, organizations save time and support costs while freeing agents to focus on higher-priority tasks.
4. Increased IT Effectiveness
AIOps ultimately allows IT employees to focus more on tasks that require human problem solving and empathy, such as projects with many moving parts and communicating with end users to help improve IT processes in ways that machines couldn’t imagine. While this focus on higher-level tasks may require retraining, employees will no longer have to deal with large volumes of routine incidents, which should make their jobs more engaging.
Because AIOps will enable faster and more informed decision-making, IT will benefit from increased productivity and more strategic business decisions.
5. Improved End-User Experience
AIOps should reduce the amount of time end users spend with IT issues thanks to proactive resolutions, faster reactive support, and automation. Better end-user experience means that employees can be more productive, and frustrations cause by technology will be less frequent.
While unique to the industry, better end-user experience typically results in better products or services for the end consumer. For example, a physician can spend more time on care rather than waiting for charts to load (or worse, falling back on paper methods due to outages), customer service agents can accommodate more tickets and requests instead of waiting on a computer to boot up, and so on.
Roadblocks to Adoption
Achieving the transformative benefits of AIOps requires understanding and planning for its limitations. Remember our earlier definition of the primary components of AIOps: big data and ML. AIOps tools will ingest and process the data you send to them, meaning that if you feed them low-quality data, expect low-quality insights.
Two main roadblocks to AIOps implementations have been identified due to the importance of data for success: 1) a shortage of data science skills and 2) inadequate data sources/quality.
1. Skills Shortage
Success with machine learning necessitates data science skills. Most organizations will have to hire individuals with such skills or invest in employee training.
According to Gartner, “Machine learning promises much. However, without data science skills, it will be difficult for I&O [Infrastructure & Operations] leaders to realize that promise beyond the basic use cases of event correlation and anomaly detection” (research available to Gartner subscribers).
Additionally, the automation component of AIOps may require individuals with the time and capability to write scripts for executing custom actions.
2. Data Sources and Quality
Are you already monitoring across all domains (application, network, infrastructure, end-user experience)? If not, start there before moving on to AIOps. End-to-end monitoring offers more complete visibility, which will result in better AIOps insights.
Auditing the quality of your data is also key for ensuring the tool’s accuracy.
AIOps platforms aggregate and store data collected by application performance monitoring (APM), network performance monitoring and diagnostics (NPMD), digital experience monitoring (DEM), and IT infrastructure monitoring (ITIM) tools. They then analyze the data (both historical and real-time) using advanced analytics and machine learning to reduce noise, detect anomalies, discover patterns, and forecast events. These insights are presented as visualizations within the platforms and can be tied to actions to prevent or resolve incidents autonomously or by prompting a user to act.
The current market for AIOps tools isn’t limited to full platforms. Monitoring vendors have also begun integrating AIOps features into their existing toolsets. Full platforms will typically serve the needs of a broader audience, whereas feature-enriched monitoring tools enhance analytics for their domain specialists.
Key Components of an AIOps Platform
- Vendor-agnostic data ingestion and storage that can handle multiple data sources, such as log data, wire data, text data, metrics, and APIs
- ML and other advanced analytics, anomaly detection, pattern discovery, and predictive analytics
- Real-time (streaming) and historical (trending) presentation of information
Integrating AIOps with Digital Experience Monitoring
Lakeside’s approach incorporates AIOps features into our digital experience monitoring (DEM) solution, SysTrack. By combining AIOps with DEM, SysTrack surfaces insights that help IT operations and service desk teams remediate and prevent problems at the endpoint.
Gartner offers this assumption (research available to Gartner subscribers):
“In 2023, large enterprise exclusive use of artificial intelligence for IT operations and digital experience monitoring tools to monitor modern applications and infrastructure will rise from 5% in 2018 to 30%.”
Here’s a basic overview of how SysTrack AIOps works:
1. Data Collection
SysTrack continuously collects a tremendous amount of data (10,000+ data points every 15 seconds) on all physical and virtual endpoints in the environment. This data includes log data, metrics, and third-party APIs, forming a complete picture of end-user experience, usage, and performance. Our patented distributed database architecture makes this collection almost undetectable, with the agent consuming less than 1% of a system’s CPU.
2. Automated Problem Detection
High-quality data forms the foundation of SysTrack AIOps. From there, we use sensors to evaluate the data at the endpoint. Next-generation alarms, sensors are unique language expressions that outline conditions and KPIs for problem detection. Sensors were created to solve known pervasive and preventable issues affecting end-user computing environments, drawing on Lakeside’s 20+ years of experience crafting enterprise IT software along with input from partners such as IBM and SysTrack customers. Each sensor has a unique description, including a severity level, that tells you why something is happening and what to do about it.
3. Pattern Discovery
SysTrack AIOps uses pattern algorithms to detect groups of related sensors. These intelligent groupings accelerate root cause analysis by aggregating the insights IT needs to draw conclusions.
SysTrack also incorporates natural language processing (NLP) and AI-driven sentiment scoring. The integration of NLP powers an easy-to-use search function that enables IT to ask SysTrack anything about their environment. SysTrack then surfaces the most relevant dashboard for faster investigation. Organizations can even integrate this insight engine into other software with SysTrack’s NLP training tool.
Sentiment scoring powered by IBM Watson helps IT process qualitative feedback from end users gathered through SysTrack surveys. The ability to examine both objective and subjective perspectives is important for creating a holistic view of end-user experience. An application of this technology could be surveying users who were provisioned new laptops. Rather than analyze each result, sentiment scoring is a quick way to assess whether responses are positive, negative, somewhere in between, or mixed. IT teams can compare this data with SysTrack’s end-user experience score, which is calculated based on what percentage of time users are able to work without being impacted by IT problems. Perhaps the computing experience of the new laptops is good, but their form factor makes them awkward for travel or their lack of ports makes dongle management a hassle. Monitoring software couldn’t have told you those things, but they’re valid components of end-user experience all the same.
Want to know more about SysTrack AIOps? Request a free demo.
Ready to start getting better insights from your data? Use these questions as a guide for assessing your organization’s readiness and evaluating vendor’s solutions.
- Do you have established implementations of application performance monitoring (APM), network performance monitoring and diagnostics (NPMD), digital experience monitoring (DEM), and IT infrastructure monitoring (ITIM)?
- Does your team include data scientists, or the data science skills needed to support an implementation?
- Have you clearly defined what use cases you would implement and your expected benefits?
Questions to Ask AIOps Vendors
- Is the tool a standalone AIOps platform that can be purchased on its own or is it a monitoring platform with added AIOps features?
- What kinds of data can the AIOps tool ingest? Will the data be of adequate breadth and depth for your implementation?
- Does the tool charge based on the amount of data ingested? If so, how can you optimize data feeds for avoiding duplication or unnecessary costs?
- How does the tool use machine learning? What other advanced analytics are involved? What skills are needed to support them?
- Do the analytics and visualizations fit the needs of your implementation (i.e. are they designed to be used by the service desk, IT ops, a specific industry, etc.)? What use cases does the tool enable?
Back when AIOps was just starting to enter the IT lexicon, we wrote about what AI could mean for the future of IT. That vision is still taking shape, but there is clear potential for this technology to help address some of the toughest challenges that accompany digital transformation.
With assistance from machine learning and analytics, IT can spend less time finger-pointing and more time acting on high-value information to support critical business operations and improve end-user experience.
For more on choosing the best AIOps tool for you, check out our condensed version of this guide. It’s a great resource to get you started on your decision-making journey.
 Gartner, Innovation Insight for Artificial Intelligence for I&O Transformation, Milind Govekar and Chirag Dekate, 4 January 2019
 Gartner, Predicts 2019: IT Operations, Terrence Cosgrove, et al., 11 February 2019
 Gartner, Deliver Cross-Domain Analysis and Visibility With AIOps and Digital Experience Monitoring, Charley Rich and Padraig Byrne, 5 July 2018
 Gartner, Predicts 2019: IT Operations, Terrence Cosgrove, et al., 11 February 2019