Three Keys to Incident Response: On-Call Schedules, Escalation Policies, and Routing Rules
Organizations are drowning in alerts, incidents, and chaos that prevent them from doing their jobs and serving their customers.
Notably, for businesses who operate always-on services, an outage or downtime can be devastating to their bottom line, not to mention a poor experience for their customers and users.
Without a robust & proactive plan, and the tools in place to automatically kick off remediation, a small issue can quickly become a significant incident. Teams commonly experience disjointed handling of events, not knowing who is on-call, slow or no response to acknowledging and finding a resolution, a drop in customer satisfaction and loss of business.
So what do you do? How do you ensure you have the systems and tools in place to smoothly respond to alerts and incidents, reduce mean time to acknowledge and resolve, and along the way delighting your customers?
It starts with thinking strategically about your incident response plan. While a broader plan should be developed and agreed to, today we want to focus in on three critical areas tied to a successful reply to alerts and incidents: on-call schedules, escalation policies, and routing rules. Within Opsgenie you control these three key areas, and they make all the difference in effectively handling alerts and incidents.
Let’s take a look at each in detail.
There is nothing like getting a handwritten card in the mail. However, handwriting your on-call schedule or creating an excel spreadsheet is by far the most inefficient method for scheduling on-call team members.
Opsgenie eliminates this cumbersome task and automates schedules according to your organization's needs. You can create, manage, and track who is on call and for what duration. On-call scheduling provides endless configuration options. Rotations can be customized based on daily, weekly, or custom shifts. You can even specify rotations based on day of the week and time of day.
To see on-call schedules action, watch our video below.
Your on-call team member is having trouble with his phone lately. In the middle of the night, his phone finally stops working. He won't receive the alert until several hours later. What do you do? Escalation policies ensure that after a set amount of time without acknowledgement, an escalation path is triggered so those critical alerts are never missed.
To see escalation policies action, watch our video below.
What would you do if every night at 2 A.M. your neighbor rang your doorbell to tell you your house is on fire – when in reality, there is no fire. Well, our first thought is you have annoying neighbors. However, we understand alert fatigue. No one wants to be woken up unless intervention is needed. That is why routing rules in Opsgenie ensure alerts are routed to the right people and teams based on alert content and time of day. Sleep well knowing you will only be alerted if there is a urgent problem.
To see routing rules in action, watch our video below.