SYS-CON MEDIA Authors: Liz McMillan, Carmen Gonzalez, Zakia Bouachraoui, Roger Strukhoff, David Linthicum

Blog Feed Post

The State of Alerting in the IT Ops world

OnPage Corp. just finished a survey of more than 100 ITOps professionals from across the United States. Our goal was to acquire a greater understanding of how well engineers in the industry are performing when it comes to critical alerting and alert management of their IT teams.

We wanted to understand the antecedents of alert fatigue for ITOps that appear earlier in the food chain. We also wanted to see how many alerts teams receive per day as well as who gets alerted. We wanted to understand how alerts are managed. And, we wanted to see how well teams analyze their actions and take those lessons forward.

In many ways, the survey was successful. We received a large number of responses from a number of industries and acquired a strong sense of how ITOps is performing across the country. Unfortunately, we also saw that for all the Chaos Monkeys and strides toward improved response to alerting, there is still a significant lack of progress.

What’s All the Buzz About?

Automated alerting is an essential component of monitoring. Automated alerts are what allow teams to receive automatically generated alerts from multiple points along their IT stack and software. In theory, this multitude of alerts is what enables teams to more quickly identify the causes of a problem and minimize the severity. The hope is, that with early recognition of the issue, engineers will be able to minimize service degradation and disruption.

But alerts aren’t always as effective as they could be or need to be. Real problems often are lost in a sea of noisy alarms. As our survey showed, this is because teams are inundated with alerts coming in via multiple formats. Moreover, the barrage of alerts leaves teams inundated and practically unable to cope.

The Law of Above Average

Our survey showed that more than 80 percent of IT teams are alerted to critical incidents via email. General best practices would dictate that email is fine for daily communication inside a business. However, for critical incidents, email is less than ideal, as it allows critical incidents to get buried under a pile of other emails. There is no way for critical issues to rise to the top of the pile.

Since our questions were multiple choice, respondents could provide multiple answers with regard to how they received notice of critical incidents. So, while email was the most prevalent form of communication, individuals indicated that they are also simultaneously receiving alerts by SMS or phone call.  Our survey showed that 58.9 percent and 51.4 percent of respondents received alerts via these methods.

Already from this nugget of information, we see the opportunity for both information overload and an opportunity for missed alerts. By simultaneously receiving alerts through multiple formats, the level of irritation and overload inevitably rises. At the same time, if emails are the only form in which IT professionals receive alerts then there is a high opportunity to miss alerts all together.

How Many Alerts Was That?

The survey results also indicated that just over 41 percent of ITOps receive 11 alerts or more per day. Additionally, just over 20 percent of this group received 40 alerts or more per day. While 40 alerts is clearly more than a team can reasonably manage or should manage, this figure also goes a long way toward explaining why some alerts just get missed. If more than 40 alerts are sent to you and your team every day, it becomes very difficult to prioritize alerts and determine which should be handled first.

Perhaps to better manage this large number of alerts, many teams use escalation procedures. Our survey showed that 76.6 percent of respondents have some sort of escalation procedure in place. At the same time, the most frequent ways to escalate critical responses was through email or SMS.

The conclusions one can draw from these numbers are that, despite the large number of papers written on improving alert management, many ITOps have not been able to achieve this end. While our survey did show that just shy of 59 percent receive a manageable number of alerts, 41 percent are inundated.

Not Just Intelligence, Business Intelligence

Perhaps analysts of the industry could be more optimistic if they saw that teams were using analytics to track how well they are performing. If teams employed analytics, they would be better able to review their progress, see where they are failing to meet the grade and then embark on routines to improve. Unfortunately, this is not the case.

When asked whether their team has employed any type of business intelligence to review and analyze their team’s performance, more than 70 percent reported that they did not subscribe to any BI platform. The problem with this result is more than just a missed opportunity; it is also the loss of opportunity to fundamentally improve the business at many levels.

One of the most important reasons why you need to invest in an effective BI system is because such a system can improve efficiency within your organization and, as a result, increase productivity. Effective business intelligence can also improve the decision-making processes at all levels of management and improve your tactical strategic management.

Yet by forgoing investments in these BI tools, teams are failing to investigate their processes and methods that would improve their team and minimize alert fatigue.

A Call for Smart Alerting

The lesson can be drawn from this is that companies don’t necessarily need more alerting. What they do need is to shift toward more smart alerting.

Smart alerting means that not every bump on the monitoring screen gets tied to an alert. Instead, monitoring output is calibrated so that possibilities are aligned with probabilities and impacts. Alerts also get sent to the teams or individuals that are best able to manage the issue. Additionally, alerts are actionable and come with instructions regarding what the problem might be.

Smart alerting also means that teams use business intelligence tools such as reports and graphs and charts to determine which of their practices have been effective or not effectiveWithout this insight, teams are often unaware of the subtle points that could really impact their team and provide them with a way to improve their output.

Conclusion

There are a number of insights that can be garnered from our survey. I encourage you to take a moment and download a copy of the study and see what you can learn that will help your team.

The post The State of Alerting in the IT Ops world appeared first on OnPage.

Read the original blog entry...

More Stories By OnPage Blog

OnPage is a disruptive technology and application that leverages today's technology and smartphone capabilities for priority mobile messaging. With a top notch history of ensuring uninterrupted communication for businesses and critical response organizations, OnPage is once again poised to pioneer new mobile communications methodology for business and organizational use.

Latest Stories
When you're operating multiple services in production, building out forensics tools such as monitoring and observability becomes essential. Unfortunately, it is a real challenge balancing priorities between building new features and tools to help pinpoint root causes. Linkerd provides many of the tools you need to tame the chaos of operating microservices in a cloud native world. Because Linkerd is a transparent proxy that runs alongside your application, there are no code changes required. I...
Druva is the global leader in Cloud Data Protection and Management, delivering the industry's first data management-as-a-service solution that aggregates data from endpoints, servers and cloud applications and leverages the public cloud to offer a single pane of glass to enable data protection, governance and intelligence-dramatically increasing the availability and visibility of business critical information, while reducing the risk, cost and complexity of managing and protecting it. Druva's...
Kubernetes as a Container Platform is becoming a de facto for every enterprise. In my interactions with enterprises adopting container platform, I come across common questions: - How does application security work on this platform? What all do I need to secure? - How do I implement security in pipelines? - What about vulnerabilities discovered at a later point in time? - What are newer technologies like Istio Service Mesh bring to table?In this session, I will be addressing these commonly asked ...
At CloudEXPO Silicon Valley, June 24-26, 2019, Digital Transformation (DX) is a major focus with expanded DevOpsSUMMIT and FinTechEXPO programs within the DXWorldEXPO agenda. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throug...
BMC has unmatched experience in IT management, supporting 92 of the Forbes Global 100, and earning recognition as an ITSM Gartner Magic Quadrant Leader for five years running. Our solutions offer speed, agility, and efficiency to tackle business challenges in the areas of service management, automation, operations, and the mainframe.
Blockchain has shifted from hype to reality across many industries including Financial Services, Supply Chain, Retail, Healthcare and Government. While traditional tech and crypto organizations are generally male dominated, women have embraced blockchain technology from its inception. This is no more evident than at companies where women occupy many of the blockchain roles and leadership positions. Join this panel to hear three women in blockchain share their experience and their POV on the futu...
In his general session at 21st Cloud Expo, Greg Dumas, Calligo’s Vice President and G.M. of US operations, discussed the new Global Data Protection Regulation and how Calligo can help business stay compliant in digitally globalized world. Greg Dumas is Calligo's Vice President and G.M. of US operations. Calligo is an established service provider that provides an innovative platform for trusted cloud solutions. Calligo’s customers are typically most concerned about GDPR compliance, application p...
Modern software design has fundamentally changed how we manage applications, causing many to turn to containers as the new virtual machine for resource management. As container adoption grows beyond stateless applications to stateful workloads, the need for persistent storage is foundational - something customers routinely cite as a top pain point. In his session at @DevOpsSummit at 21st Cloud Expo, Bill Borsari, Head of Systems Engineering at Datera, explored how organizations can reap the bene...
"NetApp's vision is how we help organizations manage data - delivering the right data in the right place, in the right time, to the people who need it, and doing it agnostic to what the platform is," explained Josh Atwell, Developer Advocate for NetApp, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
The Jevons Paradox suggests that when technological advances increase efficiency of a resource, it results in an overall increase in consumption. Writing on the increased use of coal as a result of technological improvements, 19th-century economist William Stanley Jevons found that these improvements led to the development of new ways to utilize coal. In his session at 19th Cloud Expo, Mark Thiele, Chief Strategy Officer for Apcera, compared the Jevons Paradox to modern-day enterprise IT, examin...
With 10 simultaneous tracks, keynotes, general sessions and targeted breakout classes, @CloudEXPO and DXWorldEXPO are two of the most important technology events of the year. Since its launch over eight years ago, @CloudEXPO and DXWorldEXPO have presented a rock star faculty as well as showcased hundreds of sponsors and exhibitors! In this blog post, we provide 7 tips on how, as part of our world-class faculty, you can deliver one of the most popular sessions at our events. But before reading...
Cloud-Native thinking and Serverless Computing are now the norm in financial services, manufacturing, telco, healthcare, transportation, energy, media, entertainment, retail and other consumer industries, as well as the public sector. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long development cycles that pro...
DSR is a supplier of project management, consultancy services and IT solutions that increase effectiveness of a company's operations in the production sector. The company combines in-depth knowledge of international companies with expert knowledge utilising IT tools that support manufacturing and distribution processes. DSR ensures optimization and integration of internal processes which is necessary for companies to grow rapidly. The rapid growth is possible thanks, to specialized services an...
At CloudEXPO Silicon Valley, June 24-26, 2019, Digital Transformation (DX) is a major focus with expanded DevOpsSUMMIT and FinTechEXPO programs within the DXWorldEXPO agenda. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throug...
Docker and Kubernetes are key elements of modern cloud native deployment automations. After building your microservices, common practice is to create docker images and create YAML files to automate the deployment with Docker and Kubernetes. Writing these YAMLs, Dockerfile descriptors are really painful and error prone.Ballerina is a new cloud-native programing language which understands the architecture around it - the compiler is environment aware of microservices directly deployable into infra...