Technological difficulties are an inescapable part of everyday life and they can be massively inconvenient. Just ask any of the United Airlines passengers from the hundreds of flights that were delayed earlier this year due to computer problems. In that instance (a notable, but not isolated example), the inability to quickly resolve a system issue directly wasted hours of people’s time all over the country. We see smaller examples of frustrating computer failures in our day-to-day lives, and the route to identifying and solving an issue is often fraught with turbulence.
These issues are so persistent that, in the Information Technology world, their resolution is an integral part of the success of entire organizations, and a full-time job for service desk professionals. Surprisingly, even at the professional level, many are unequipped with the technology to help them efficiently resolve issues at the source. Even the largest IT environments have simply replaced costly hardware only to find that the same issues keep arising, building mistrust and frustration in end-users. But it doesn't have to be this way...
Root cause analysis: a better approach
Root cause analysis is the process of examining the underlying issues that caused an error in a system. In an IT environment, this means analyzing the specific hardware and software performance indicators at the time of the error, and determining whether it was user action, a hardware or software fault, or an external problem that caused the failure.
How to perform stellar root cause analysis
1. Treat the root of the problem, not the symptoms
Treating the cause of a problem, rather than the symptoms, is what defines root cause analysis. Going straight to the source ensures that the same issue doesn't keep happening, which can lead to lower ticket volume for helpdesks, and greater system up-time and user productivity.
Consider a case in which an end-user calls the helpdesk to report a slow computer. The technician who receives the call opens her system administration software to take a closer look. She confirms that the machine is having an issue by observing a low user experience score. The technician knows that a slow computer could be caused by a variety of reasons. To begin the process of finding the root cause of the slowness, the technician considers the best way to see quantifiable evidence of system slowness.
2. Lead with endpoint data
Today's computing environments are incredibly complex. Ones where critical IT functions might or might not be managed in house. This is why, today, the endpoint is the most privileged point of view for IT.
In our example, the technician uses her software to discover that a program is using an unusually large amount of RAM. She's able to do this via an endpoint agent installed on the end-user's machine, which allows her to instantly see a variety of quantitative metrics about the computer. Investigating further, she finds that a software patch to fix this issue was recently released, but has not been installed yet. After installing the patch, the memory leak is fixed, and the user experience score improves dramatically.
The root cause of the machine's slowness was an uninstalled software patch. By determining that it was a software issue, and not a hardware or networking problem, the technician didn't need to escalate or pursue the ticket further. Her ability to see metrics about the computer straight from her desk allowed her to quickly diagnose, investigate, and resolve the issue.
3. Automate. Automate. Automate.
For years we have known that when it comes to IT service management, being proactive is better than being reactive. But organizations are still struggling with reaching these proactive goals. Why? The ability to be proactive is directly dependent on the quality of the data being analyzed and the tools being used to predict and prescribe issues based on that data.
Looking ahead, the technician from our example can automate the resolution of this issue by utilizing the endpoint agents. By identifying other machines with the uninstalled patch and sending out an update to all the associated agents, she ensures that no more tickets reach her desk for the same problem.
Root cause analysis as a key component of Workspace Analytics
Workspace analytics is a critical component of any modern, metric-driven IT workplace. They allow IT professionals to make important decisions based on quantitative measures, rather than subjective estimates. Excellent root cause analysis by technicians of all seniority levels is a key aspect of workspace analytics. Efficient analysis and problem solving can lead to lower overall IT costs by reducing excessive resource provisioning, better helpdesk KPIs by lowering the amount of escalated tickets, and higher end-user satisfaction and productivity due to better user experience scores.
Brandon Oyer is an applied engineer at Lakeside.