Most operations teams use number of disparate monitoring tools (and services) to monitor the technology infrastructure, network, systems, applications etc. These monitoring tools all have some degree of alerting. They can generate alerts when they detect problems and can send alert notifications via email, etc. Yet alerting, particularly what happens after an alert is generated differs significantly from between tools.
Operations folks at Etsy said it best with “measure anything, measure everything”. Metric (aka time series) data collection, visualization, and alerting are essential operations management capabilities. We need to be able to track not only systems metrics such as CPU and memory utilization, but also (even more so) application and business metrics such as response times, number of transactions, etc.