Creating Actionable Alerts is a continuous process that can enhance your workflows so that not only are the correct people notified at the right time, but they can take immediate action to reduce potential business-impact. This post is the first in a three-part series about alert enrichment. Without actionable alerts, your responders may be alerted to an issue, but cannot necessarily take immediate action, which has the potential to increase downtime and slow down the remediation process. Actionable alerts set your responders up for success from the start of an Incident, and empowers them to immediately start repairing damaged services. There are many ways to create Actionable Alerts, so as an introduction to our newest White Paper, Creating Actionable Alerts to Maximize Resolution Speed, we want to share our first method.
One of the most impressive books on DevOps, “The DevOps Handbook”, emphasis three fundamental principles underpinning DevOps: systems thinking, amplify feedback loops, and continual experimentation & learning.
Amplifying feedback loops is described as creating the right to left feedback loops, which helps corrections to be made continually, by Gene Kim in his blog post. But, let’s start with why we should do this in the first place.
Being on-call can be a daunting and disruptive experience. Many people with on-call duties complain how having to be ready to handle incidents affects work-life balance, even health, as on-call employees may be frequently woken up in the middle of night or may need to plan evenings and weekends while considering on-call duties. As organizations enroll changes to scale on-call teams, it needs to be considered how to best match that evolution with a sustainable and humane solution. Below is some advice based on our experiences at OpsGenie so far with our customers.
For more information, download our recent White Paper: Scaling On-Call in a DevOps Organization for more information on the subject.
Follow-the-sun schedules are a way for your company to offer 24/7 global customer support and also prevent on-call burnout for your engineering/customer support teams. Having someone on-call at all times, across different time zones means that no one team has to wake up in the middle of the night to deal with an alert or customer issue. True to its name, ideally it follows the sun in that the configuration usually consists of three rotations that are staggered to cover three 8-hour shifts. However, there are multiple ways to configure a follow-the-sun schedule using OpsGenie schedules.
One of the key ways to get "In the Know" for all things DevOps is by attending one of the many conferences aimed at educating developers, engineers, and all technical professionals on best practices and newest innovations in the DevOps realm. Whether you’re looking for new tools to implement, application building guidance, security tips, information on cloud software and storage, serverless infrastructure, automation, or something else on your list, these conferences bring together industry leaders and experts to share their wealth of knowledge. OpsGenie attends some of these events ourselves, and we’d love to see you there!
Over the last few years, teams have realized the benefits of sharing and distributing knowledge in chat applications such as Slack. Today, teams are extending the use of these applications beyond collaboration by embracing ChatOps. ChatOps empowers teams by bringing complex day-to-day operational work into shared chat channels. If done correctly, it drastically reduces context switching and increases the speed at which teams can tackle tough challenges.
There’s no shortage of reasons why an organization decides to move part, or all, of its operations to the cloud. Generally speaking, these reasons fall into three categories: improving competitiveness, reducing cost, and offsetting risk. These reasons often overlap as well. For example, a company looking to move application development into cloud environments could be targeting this as a way to accelerate deployment, reduce overhead IT costs, improve team collaboration, and more rapidly scale operations (operationally or geographically).
In 1970, a series of devastating wildfires swept across Southern California, destroying over 700 homes across 775 square miles in 13 days, resulting in more than $233 million in losses (over $1 billion in today’s dollars, adjusted for inflation). Thousands of firefighters from around the state and beyond responded, but found it very difficult to work together. They certainly knew how to fight fires, but lacked a common management framework that could scale up or down based on the needs of the incident. They also lacked a standardized approach for incident leadership, which extended beyond each individual fire department. Shortly thereafter, fire service leaders came together and created a new, and at that time, revolutionary system for managing incidents, capable of managing everyday fire and medical incidents to large scale incidents that make the national news. A new way of managing incidents was born that day!
We love Slack like you do because it is where we get things done at work. Slack applications are the gateway for our favorite tools like Intercom, Jira, Google Drive and many more. There are also ChatOps tools like OpsGenie’s Slack application focusing on improving collaboration and automation by bringing day to day to operational challenges into shared chat channels.