How to configure CPU alerts with Logit.io?
CPU alerts are a critical component of proactive system management. They provide insights into system performance, resource allocation, security, and cost control, enabling organizations to optimize their IT infrastructure, enhance user experiences, and ensure the reliability and security of their systems.
In the steps below, we'll walk you through the process of setting up a CPU alert rule in Grafana. This rule will monitor the idle CPU usage of your servers and notify you when it exceeds a certain threshold, which could indicate underutilization. While this is a specific example, the process for setting up other alert rules in Grafana will follow a similar pattern.
Pre-requisites
Configure contact points
Configure Notification Policies
1. New Alert Rule
Grafana has an entire section dedicated to alerts. In the main dashboard, find and select the Alerts section. Here, you'll see an option for 'New Alert Rule'. When you select this, you'll be taken to a new screen where you can set up your alert. This is where you'll specify what conditions must be met for the alert to be triggered.
2. Select a Metric
A metric in this context is a measurement that your system is tracking. You might track many metrics, such as CPU usage, memory usage, network traffic, etc. Here, you're selecting 'cpu_usage_idle' which represents how often your CPU is idle, or in other words, not being used. This metric can help you understand if your system is being underutilized or overutilized.
In this step, you're also choosing which instances of your metric to apply the alert rule to. For example, if you're monitoring several servers, 'host' could be a label that indicates which server the CPU data is coming from. You can also set the comparison to '=~', which means you'll be able to use regex (regular expressions) to match multiple hosts based on patterns. If you have multiple servers sending CPU data, you can either select one specific server or use a regex to select multiple servers that match a pattern. For example, 'dashboard[0-9].*' would select all servers with names that start with 'dashboard' followed by any number.
3. Change the Function
This step is about choosing how the metric value is calculated. By default, Grafana uses the 'last()' function, which means it only looks at the most recent data point. By changing it to 'avg()', you're telling Grafana to calculate the average value over a certain period of time. This can help prevent alerts from being triggered by brief spikes in usage.
We can choose the limit that triggers the alert. If you set "IS ABOVE" to 90, then the alert will be triggered when the average idle CPU usage goes above 90%. This means you'll be notified if your CPU is idle more than 90% of the time, which could indicate underutilization.
4. Alert Evaluation
This step determines how often Grafana checks whether the alert conditions are met. By default, Grafana checks every minute. But you can change this to check less frequently, say every 5 minutes if you don't need real-time alerting.
5. Additional Labels
Here, you're adding more conditions to your alert rule. You're adding the labels 'cpu' and 'cpu-total', which could correspond to other metrics related to CPU usage. By adding these conditions, you're saying the alert should also consider these metrics, not just the idle CPU usage.
6. Name and Group the Alert
It's important to keep your alerts organized, especially if you have a lot of them. By giving the alert a name and categorizing it under a specific folder and group, you can easily find and manage it later.
7. Set Notifications
This step is about deciding how you'll be notified when the alert is triggered. You can label the alert as 'severity=warning', which tells you and your team that this is a warning level alert. You also need to set up a notification channel, such as email or Slack, through which you will receive the alert notifications.
8. Save the Alert Rule
After you've set up the alert rule to your satisfaction, the final step is to save it. Once saved, Grafana will start monitoring the specified metric and trigger the alert if the conditions you set are met.
If you have any questions or issues creating alerts, then reach out to a member of the Logit.io team via live chat and we'll be happy to help.