Before the philosophy of DevOps, developers would build products, services, and infrastructures , but the responsibility for maintaining them would shift to operators, aka system or IT admins. The DevOps philosophy removes the boundary between Operations and Development teams, making system reliability a shared responsibility of all parties.
This post is an excerpt from our White Paper: Scaling On-Call in a DevOps Organization which you can download to learn more about best on-call practices.
Being on-call can be a daunting and disruptive experience. Many people with on-call duties complain how having to be ready to handle incidents affects work-life balance, even health, as on-call employees may be frequently woken up in the middle of night or may need to plan evenings and weekends while considering on-call duties. As organizations enroll changes to scale on-call teams, it needs to be considered how to best match that evolution with a sustainable and humane solution. Below is some advice based on our experiences at OpsGenie so far with our customers.
For more information, download our recent White Paper: Scaling On-Call in a DevOps Organization for more information on the subject.
Speed of execution is the greatest strength of the agile companies. When we look at the Agile and DevOps mindset, we can see that the sole purpose of all these movements is to make things faster without sacrificing quality.
We’re thrilled to announce the release of our advanced reporting and analytics capabilities which let you analyze your incident response with a variety of granular reports! Powered by Looker’s data visualization and exploration features, our new platform unlocks your Operations data and enables self-service analytics through unique features such as:
- Powerful visualizations: Our new platform provides many different visualizations you can use to make sense of your data. From cartesian charts to pie and donut charts to timelines and tables, your data is visualized in various ways to help you gain fast insights.
At OpsGenie, it is important for us to provide the best user experience and to keep our integrations up-to-date. We are an early adopter of the newest features released by the companies with which we integrate.
OpsGenie’s Slack App is one of the most popular integrations with our customers. The Slack integration allows users to forward alerts to Slack channels, and then lets them interact with those alerts -- using either slash commands or Slack buttons. For more information, you can refer to ChatOps with Slack and OpsGenie page.
Today OpsGenie announces its new integration with Slack’s Message Menus, an interactive menu feature. Now, a simple set of menus lets you easily perform Assign, Take Ownership, and Snooze actions on alerts in the OpsGenie Slack App.
The OpsGenie team recently had a thorough and heated discussion (KAPOW!!) on who would be better with on-call alerts and incident management, Superman or Batman? Who would come out as winner when pitted against each other in a war of on-call alerts and response time? So, we thought we would hash it out here on our blog in a completely fictional format. We’ll try to examine each area of alerting and incident management to see who we think we would want on our on-call team.
Erik Budin of ScienceLogic has a great blog post that describes the integration of ScienceLogic with (our competitor) PagerDuty. Kudos to both parties for coming up with a well thought out, bi-directional integration that goes well beyond the alerting integration supported by many of the monitoring solutions in the market! We believe that to be able to truly enable operations teams to work effectively, monitoring and alerting integration needs to be much richer than just forwarding alerts. Hence, it’s good to see this type of effort implemented and described in detail. Erik starts the blog post with a real-world scenario that has become possible with the integrated solution: