Since we launched the OpsGenie phone call routing feature last year, we’ve had an enormously great response from customers. So much, in fact, that we’re dusting off this blog post from last year and updating it for everyone who is not as familiar with it. Is it easy to use? Yes, it is! You see, OpsGenie routes alerts to the appropriate on-call individual using a method of policies, on-call schedules, etc.. Prior to the launch of the application last year, we heard similar questions from a number of our OpsGenie customers, such as “Can we route phone calls to the right person like we route the alerts?” This turned out to be a great question, one that resonated with many of our customers. For a product team, customer feedback like this is priceless!
As an alert notification solution, our first priority is to ensure that the right person is notified when there is a problem. OpsGenie sends multiple notifications through different channels, escalates etc. to ensure that critical alerts don’t get missed. As crucial as that is, if an alert notification system just stops at “waking you up”, it becomes part of the problem rather than a solution.
Every service provider wants their services to be available 24x7x365. But outages and planned maintenance are inevitable occurrences for online software services. Dealing with outages and communicating with users during the outage is as important as the availability of the services provided. To keep users informed, many service providers use web based “status pages” that contain up to date information about the health of the services, incidents, and what the provider is doing to resolve the issues.
OpsGenie is an incident management system for Dev & Ops teams. Customers use OpsGenie to consolidate their alerts generated by monitoring systems and route them to the right people using on-call schedules and escalations. Because OpsGenie is an essential tool used during outages and we have vital information about the incidents; our customers have been inquiring if we can create “status pages” programmatically based on the alerts generated in OpsGenie.
Responding this request, we’ve taken up the challenge to provide this solution to manage status pages for OpsGenie customers.
As long as our applications are in production, boosting uptime and avoiding outages is the highest priority for us developers and operational teams. Despite the great care, having 100% uptime and avoiding outages is a challenging task for even the most stringent DevOps teams. Let’s imagine that one of your data centers stops responding and in-turn your email service is completely out, or your payment service has gone offline during Black Friday. Remember the AWS outage that lasted four days and affected countless numbers of cloud services in April 2011. This is a good example that outages happen even to the most secure environments.. Now what? Are you going to examine huge log files to find out what went wrong? Are you going to notify all of your operational teams and developers at the same time to investigate the cause? Unless you allocate large resources for chaos engineering like Netflix does, you most likely will have very limited time to overcome the issue. So those aren’t realistic options for most organizations.
I’ve spent many years implementing traditional enterprise IT operations management tools. Integrations among various tools are often the Achilles’ heel of management systems. Integrating various applications is often a high-risk endeavor for customers. Enterprise vendors typically charge tens of thousands of dollars for integration “plugins”, and the implementation requires highly skilled (and expensive) engineers. To make matters worse, enterprise vendors are often not keen on collaborating with their competitors, let alone collaborating to help their customers. Vendors sometimes even block these integration efforts. I’ve witnessed a vendor not selling their product to prevent them from integrating with it (how is that for putting the customer first).
OpsGenie Webhook integration provides great flexibility to build solutions for specific requirements. In this blog post, we'll build a real-time dashboard for OpsGenie alerts. This dashboard will provide a quick overview of the most recent open alerts in OpsGenie, and when there are new activities on alerts, the dashboard will reflect these changes immediately.
The solution leverages AWS API Gateway and Lambda services as a serverless backend and PubNub for the real-time data stream. Both AWS and PubNub offer free tiers.
OpsGenie integration family has many members and still growing. The objective of this blog post is to explain using one of the newly added ones, Logstash. Logstash is a data pipeline that helps you process logs and other event data from a variety of systems.
A Logstash pipeline in most use cases has one or more input, filter, and output plugins. Logstash has a rich collection of input, filter, codec and output plugins. A filter plugin performs intermediary processing on an event. Filters are often applied conditionally depending on the characteristics of the event. An output plugin sends event data to a particular destination. Outputs are the final stage in the event pipeline.
In the previous blog post, we've gone through how to create JIRA issues from OpsGenie alerts, using open source Marid utility. Marid approach is particularly useful when integrating OpsGenie with an on-premise, self-hosted JIRA instance since Marid initiates the connection and does not require opening up the network.
Whenever possible, we implement direct integration between OpsGenie and IT management services used by our customers. For example, we have direct inbound integration with JIRA that enables OpsGenie customers to create alerts and notify users for JIRA issues.
Mobile development requires hard effort to meet the expectations and needs of customers. OpsGenie Mobile Apps provide a user-friendly UI in parallel with a good user experience design; however, mobile development needs much more! Mobile apps should be fast, stable and memory-friendly beside providing a user-friendly UI. Therefore, we are monitoring OpsGenie Mobile apps continuously with the help of New Relic and Crashlytics to be able to improve our apps continuously (and of course applicatively).