In the previous blog article we explained how to use the DellEMC PowerStore REST API to extract metrics. In this article we are going to use those skills to create a Prometheus exporter for PowerStore and some useful dashboards in Grafana
This article will cover the following topics:
- Quick overview on Prometheus
- Explanation our exporter’s code
- Explanation of our implementation of the Health metric
- How to install the exporter
- How to install our Grafana sample dashboards to visualize the data
Disclaimer: The purpose of this article and the code we have provided is educational and is provided as-is. Feel free to download the code, play with it, customize it to suit your needs and make it more robust if you want to use it in production
The code for the exporter has been written in Python. The code and the sample Grafana dashboards featured in this article can be accessed in this GitHub repo: https://github.com/cermegno/prometheus-grafana-dell
Quick Prometheus overview
This is going to be a “very quick” Prometheus overview. If you are more interested in the topic there is plenty of good info out there. If you want me to name one in particular, I think the author of this multi-part blog series covers all the important topics very well.
Prometheus is an open source monitoring system that has become very popular lately, due in part of to the success Kubernetes. Prometheus architecture has a server component that includes a time series database. On the other side, the client component is responsible for extracting the metrics but unlike most other solutions it doesn’t push the metrics to the server. Instead it makes them available through a http interface and the server reaches out to get them when appropriate. So it uses a “Pull” mechanism, not “Push”. The piece of software that extracts metrics and exposes them via http is called the “exporter”
Nowadays when applications talk to each other they usually exchange data in JSON format. Prometheus is unusual in this aspect as well because it uses a text-based exposition format that looks like this.
Each metric has a HELP and a TYPE section in addition to the actual reading. In TYPE you can see the type of metric (counter or gauge in this example). Prometheus offers a few more “metric types” but for our purpose, the Gauge “metric type” covers most if not all the metrics we want to graph. To learn more all the metrics, I like the way they are described in the GitHub repo of the Python client including Python examples.
Labels are a way of keeping individual values for different instances of the metric. In this example we will track the read IOPS for each individual volume
Prometheus offers a very powerful query language called PromQL. On the downside doesn’t offer great visualizations for your data, which is why it’s typically combined with Grafana.
Understanding the Exporter code
Writing an exporter for Prometheus involves:
- extracting data from the entity you want to monitor. In our case, this will be PowerStore. We will do it using the REST API as described in the previous blog article
- exposing the data via a web interface in the text-based format that the Prometheus server expects. Official libraries are available in 5 languages so that we don’t need to create that specific format from scratch:
- Java or Scala
More unofficial client libraries seem to be available here. In our case we are going to use Python
These are the basic steps in Python. First we start by importing the http server and whatever “metric types” we need. In the exporter we have written the only metric required is “Gauge”. The second line creates an instance of “Gauge” that we are calling “CAPACITY_USED”. For clarity we have used upper case to named all the instances we have created. Inside the brackets you have ‘cap_used’ which is how it will be stored inside Prometheus and ‘Capacity used in GB’ which is the description that will be showed in the text based information the exporter publishes
from prometheus_client import start_http_server, Gauge CAPACITY_USED = Gauge('cap_used', 'Capacity used in GB')
At this point you will have to get information from whatever target you are monitoring, in our case we will have to query the PowerStore REST API. In our exporter you will see how the details get stored in a variable called “json_resp”. This variable is a list and as discussed in the previous article we need to get the values from the last item which in Python corresponds to “-1”. Here we “set” the gauge to that value
This is the basics process. The exporter does this for capacity and for performance metrics for the whole appliance, for each individual volume and for each interface. All these different metrics are arranged in different Python functions for more clarity. This process needs to be done in a loop with some with time between runs
The final step is to make the data available through the web interface. This is accomplished as follows
Implementing a Health metric
In this article we covered the topic of creating an overall “health” score that we can use in Grafana so that it can be used as part of an “operations center” dashboard for all your infrastructure. Let’s see the actual implementation in our PowerStore exporter
In the function where we extract the capacity metrics for the appliance we can see this section that calculates the impact to the health from these metrics
# Health computation section health_impact = 0 if json_resp[-1]['physical_used'] * 100 / json_resp[-1]['physical_total'] > 80: health_impact = 10 if json_resp[-1]['physical_used'] * 100 / json_resp[-1]['physical_total'] > 90: health_impact = 20 return health_impact
We start with a health impact of 0. If none of the thresholds have been violated that’s the value we will return. After that we have 2 different thresholds at 80% and 90% capacity utilization. You can play with the impact values yourself. In this example we are using 10 and 20 points impact respectively
If you examine the code, you will notice that it also checks the appliance read and write iops against some thresholds. The point here is to illustrate how it works. These thresholds are arbitrary and will vary between array models. The important thing is to learn how it works so that you can manipulate the code to suit your needs by adding a health calculation section to compute whatever metrics you want.
The overall idea is to storage all these “health impacts” from many areas in a list called “health_items”. Then, towards the end of the program we have another function that aggregate all the impacts into one. As you can see, in the code below there are 2 possibilities. By default you can the largest impact across all areas or you could accumulate them so that when you have different (and potentially unrelated) issues the health looks critical.
Notice how we created a HEALTH gauge at the beginning and now we are “setting” it so that it gets stored in Prometheus
def calculate_health(health_items): current_health = 100 - max(health_items) # This takes the max impact across all areas #current_health = 100 - sum(health_items) # This accumulates impact of multiple issues HEALTH.set(current_health) return
Installing the Exporter
I am using CentOS in my environment. The instructions here will be pretty much identical for other RHEL derivatives and will vary slightly for other Linux distributions. I am running the client in a separate system from the Prometheus server. I have Python 3.6.8 and the pip3 tool installed
Exporter – Prometheus Client installation
As a prerequisite, we need to install 2 Python libraries:
- the Prometheus client
- the “requests” library to help us interact with the PowerStore REST API
pip3 install prometheus-client requests
The http server will expose the metrics over port 8000, so let’s make sure the firewall is not blocking that port
firewall-cmd --zone=public --add-port=8000/tcp --permanent firewall-cmd --reload firewall-cmd --list-all
Now we can retrieve the code from the GitHub repo
git clone https://github.com/cermegno/prometheus-grafana-dell cd cd prometheus-grafana-dell ls -l
Now you can edit the “powerstore.py” to adapt it to your environment. Make sure you to adjust the PowerStore system IP address as well as the right credentials in these 3 lines
baseurl = "https://10.1.1.1/api/rest" username = "user" password = "password"
For the first try leave the interval to Five_Mins. If you change later on don’t forget to adjust the last line of the code to match the interval. By default it is set to 300 seconds “time.sleep(300)“
Once you have finished editing go ahead and run it.
# python3 powerstore3.py Point your browser to 'http://<your_ip>:8000/metrics' to see the metrics ... Collecting now ... collection completed in 4.96 seconds
After the first collection is completed you can open a web browser and open the IP address of the machine where the exporter is running. Don’t forget to include the port 8000. You should see the metrics. At the top it shows metrics for the client software itself. If you scroll further down you will see all the PowerStore metrics
Configure the Prometheus server
In order to keep this article of reasonable size we won’t cover how to install the Prometheus server, but don’t worry it is quite easy. You can find good instructions in this post.
The exporter is exposing the metrics readings via http, but unless we give instructions to the server, it won’t know about them. The file where Prometheus learns about the clients is “prometheus.yml”.
# ls -l drwxr-xr-x. 2 3434 3434 38 Feb 2 23:35 console_libraries drwxr-xr-x. 2 3434 3434 173 Feb 2 23:35 consoles drwxr-xr-x. 21 root root 4096 Feb 17 19:15 data -rw-r--r--. 1 3434 3434 11357 Feb 2 23:35 LICENSE -rw-r--r--. 1 3434 3434 3773 Feb 2 23:35 NOTICE -rwxr-xr-x. 1 3434 3434 104422165 Feb 2 23:26 prometheus -rw-r--r--. 1 3434 3434 1795 Feb 16 15:07 prometheus.yml -rwxr-xr-x. 1 3434 3434 96324456 Feb 2 23:29 promtool
Open it with your favorite text editor and add a new job in the “scrape_configs” section at the bottom. Make sure you specify the IP address of your client/exporter. Notice we are using 5 minutes “scrape interval” to be consistent with everything else
scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: "prometheus" # metrics_path defaults to '/metrics' # scheme defaults to 'http'. static_configs: - targets: ["localhost:9090"] # In this section the server learns about the PowerStore exporter - job_name: powerstore_x scrape_interval: 5m scrape_timeout: 1m metrics_path: "/metrics" static_configs: - targets: ['10.1.1.1:8000']
Now you can restart the server. In my case
Now Prometheus will go the client and retrieve the metrics. You can go the web interface and verify that they have been retrieved. By default the interface is available on port 9090
Install the Grafana dashboards
The same GitHub repository contains some Grafana dashboards. Including:
- A Storage Summary dashboard with all the important PowerStore metrics in a single row. You can duplicate the panels to show all your PowerStore arrays in one place at a high level. The left-most panels include a link to open the relevant “detailed” dashboard for each array
- A PowerStore details dashboard that contains 22 panels including 18 time series graphs covering the performance aspects of the array
- A PowerEdge dashboard that exposes data from the Redfish API such as Temperature, Fans, Voltage and Watts
- A Health dashboard to show only a single Health value for each asset. The template provided in the GitHub repo contains only 2 panels: one for a PowerStore and another one for a PowerEdge. Feel free to duplicate those if you have multiple servers and arrays. You can also take it further by adding your own “Health” metrics for other components on your datacenter, whether hardware or software
If you are familiar with Grafana you can import the JSON files and adjust the panels for your environment. If you want to speed up the process a little bit you could find and replace the IP addresses in the templates with your own ones. I have used the following conventions in the templates:
- 10.10.10.10 PowerStore exporter
- 10.10.10.11 PowerEdge exporter
- 10.10.10.100 Grafana ifself
Additionally you will need to adjust the links so that they point to the unique “id” of your “details dashboards”. In a previous blog post we described how to create links between dashboards.