DellEMC

Prometheus Exporter for PowerStore

In the previous blog article we explained how to use the DellEMC PowerStore REST API to extract metrics. In this article we are going to use those skills to create a Prometheus exporter for PowerStore and some useful dashboards in Grafana

This article will cover the following topics:

  • Quick overview on Prometheus
  • Explanation our exporter’s code
  • Explanation of our implementation of the Health metric
  • How to install the exporter
  • How to install our Grafana sample dashboards to visualize the data

Disclaimer: The purpose of this article and the code we have provided is educational and is provided as-is. Feel free to download the code, play with it, customize it to suit your needs and make it more robust if you want to use it in production

The code for the exporter has been written in Python. The code and the sample Grafana dashboards featured in this article can be accessed in this GitHub repo: https://github.com/cermegno/prometheus-grafana-dell

Quick Prometheus overview

This is going to be a “very quick” Prometheus overview. If you are more interested in the topic there is plenty of good info out there. If you want me to name one in particular, I think the author of this multi-part blog series covers all the important topics very well.

Prometheus is an open source monitoring system that has become very popular lately, due in part of to the success Kubernetes. Prometheus architecture has a server component that includes a time series database. On the other side, the client component is responsible for extracting the metrics but unlike most other solutions it doesn’t push the metrics to the server. Instead it makes them available through a http interface and the server reaches out to get them when appropriate. So it uses a “Pull” mechanism, not “Push”. The piece of software that extracts metrics and exposes them via http is called the “exporter”

Nowadays when applications talk to each other they usually exchange data in JSON format. Prometheus is unusual in this aspect as well because it uses a text-based exposition format that looks like this.

Each metric has a HELP and a TYPE section in addition to the actual reading. In TYPE you can see the type of metric (counter or gauge in this example). Prometheus offers a few more “metric types” but for our purpose, the Gauge “metric type” covers most if not all the metrics we want to graph. To learn more all the metrics, I like the way they are described in the GitHub repo of the Python client including Python examples.

Labels are a way of keeping individual values for different instances of the metric. In this example we will track the read IOPS for each individual volume

Prometheus offers a very powerful query language called PromQL. On the downside doesn’t offer great visualizations for your data, which is why it’s typically combined with Grafana.

Understanding the Exporter code

Writing an exporter for Prometheus involves:

  • extracting data from the entity you want to monitor. In our case, this will be PowerStore. We will do it using the REST API as described in the previous blog article
  • exposing the data via a web interface in the text-based format that the Prometheus server expects. Official libraries are available in 5 languages so that we don’t need to create that specific format from scratch:
  • Go
  • Java or Scala
  • Python
  • Ruby
  • Rust

More unofficial client libraries seem to be available here. In our case we are going to use Python

These are the basic steps in Python. First we start by importing the http server and whatever “metric types” we need. In the exporter we have written the only metric required is “Gauge”. The second line creates an instance of “Gauge” that we are calling “CAPACITY_USED”. For clarity we have used upper case to named all the instances we have created. Inside the brackets you have ‘cap_used’ which is how it will be stored inside Prometheus and ‘Capacity used in GB’ which is the description that will be showed in the text based information the exporter publishes

from prometheus_client import start_http_server, Gauge
CAPACITY_USED = Gauge('cap_used', 'Capacity used in GB')

At this point you will have to get information from whatever target you are monitoring, in our case we will have to query the PowerStore REST API. In our exporter you will see how the details get stored in a variable called “json_resp”. This variable is a list and as discussed in the previous article we need to get the values from the last item which in Python corresponds to “-1”. Here we “set” the gauge to that value

CAPACITY_USED.set(json_resp[-1]['physical_used'])

This is the basics process. The exporter does this for capacity and for performance metrics for the whole appliance, for each individual volume and for each interface. All these different metrics are arranged in different Python functions for more clarity. This process needs to be done in a loop with some with time between runs

The final step is to make the data available through the web interface. This is accomplished as follows

start_http_server(8000)

Implementing a Health metric

In this article we covered the topic of creating an overall “health” score that we can use in Grafana so that it can be used as part of an “operations center” dashboard for all your infrastructure. Let’s see the actual implementation in our PowerStore exporter

In the function where we extract the capacity metrics for the appliance we can see this section that calculates the impact to the health from these metrics

# Health computation section
health_impact = 0
if json_resp[-1]['physical_used'] * 100 / json_resp[-1]['physical_total'] > 80:
    health_impact = 10
if json_resp[-1]['physical_used'] * 100 / json_resp[-1]['physical_total'] > 90:
    health_impact = 20

return health_impact

We start with a health impact of 0. If none of the thresholds have been violated that’s the value we will return. After that we have 2 different thresholds at 80% and 90% capacity utilization. You can play with the impact values yourself. In this example we are using 10 and 20 points impact respectively

If you examine the code, you will notice that it also checks the appliance read and write iops against some thresholds. The point here is to illustrate how it works. These thresholds are arbitrary and will vary between array models. The important thing is to learn how it works so that you can manipulate the code to suit your needs by adding a health calculation section to compute whatever metrics you want.

The overall idea is to storage all these “health impacts” from many areas in a list called “health_items”. Then, towards the end of the program we have another function that aggregate all the impacts into one. As you can see, in the code below there are 2 possibilities. By default you can the largest impact across all areas or you could accumulate them so that when you have different (and potentially unrelated) issues the health looks critical.

Notice how we created a HEALTH gauge at the beginning and now we are “setting” it so that it gets stored in Prometheus

def calculate_health(health_items):
    current_health = 100 - max(health_items) # This takes the max impact across all areas
    #current_health = 100 - sum(health_items) # This accumulates impact of multiple issues
    HEALTH.set(current_health)
    return

Installing the Exporter

I am using CentOS in my environment. The instructions here will be pretty much identical for other RHEL derivatives and will vary slightly for other Linux distributions. I am running the client in a separate system from the Prometheus server. I have Python 3.6.8 and the pip3 tool installed

Exporter – Prometheus Client installation

As a prerequisite, we need to install 2 Python libraries:

  • the Prometheus client
  • the “requests” library to help us interact with the PowerStore REST API
pip3 install prometheus-client requests

The http server will expose the metrics over port 8000, so let’s make sure the firewall is not blocking that port

firewall-cmd --zone=public --add-port=8000/tcp --permanent
firewall-cmd --reload
firewall-cmd --list-all

Now we can retrieve the code from the GitHub repo

git clone https://github.com/cermegno/prometheus-grafana-dell
cd prometheus-grafana-dell
ls -l

Now you can edit the “powerstore.py” to adapt it to your environment. Make sure you to adjust the PowerStore system IP address as well as the right credentials in these 3 lines

baseurl = "https://10.1.1.1/api/rest"
username = "user"
password = "password"

For the first try leave the interval to Five_Mins. If you change later on don’t forget to adjust the last line of the code to match the interval. By default it is set to 300 seconds “time.sleep(300)

Once you have finished editing go ahead and run it.

# python3 powerstore3.py
Point your browser to 'http://<your_ip>:8000/metrics' to see the metrics ...
Collecting now ... collection completed in  4.96  seconds

After the first collection is completed you can open a web browser and open the IP address of the machine where the exporter is running. Don’t forget to include the port 8000. You should see the metrics. At the top it shows metrics for the client software itself. If you scroll further down you will see all the PowerStore metrics

Configure the Prometheus server

In order to keep this article of reasonable size we won’t cover how to install the Prometheus server, but don’t worry it is quite easy. You can find good instructions in this post.

The exporter is exposing the metrics readings via http, but unless we give instructions to the server, it won’t know about them. The file where Prometheus learns about the clients is “prometheus.yml”.

# ls -l
drwxr-xr-x.  2 3434 3434        38 Feb  2 23:35 console_libraries
drwxr-xr-x.  2 3434 3434       173 Feb  2 23:35 consoles
drwxr-xr-x. 21 root root      4096 Feb 17 19:15 data
-rw-r--r--.  1 3434 3434     11357 Feb  2 23:35 LICENSE
-rw-r--r--.  1 3434 3434      3773 Feb  2 23:35 NOTICE
-rwxr-xr-x.  1 3434 3434 104422165 Feb  2 23:26 prometheus
-rw-r--r--.  1 3434 3434      1795 Feb 16 15:07 prometheus.yml
-rwxr-xr-x.  1 3434 3434  96324456 Feb  2 23:29 promtool

Open it with your favorite text editor and add a new job in the “scrape_configs” section at the bottom. Make sure you specify the IP address of your client/exporter. Notice we are using 5 minutes “scrape interval” to be consistent with everything else

scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
    static_configs:
      - targets: ["localhost:9090"]

  # In this section the server learns about the PowerStore exporter
  - job_name: powerstore_x
    scrape_interval: 5m
    scrape_timeout:  1m
    metrics_path: "/metrics"
    static_configs:
      - targets: ['10.1.1.1:8000']

Now you can restart the server. In my case

./prometheus

Now Prometheus will go the client and retrieve the metrics. You can go the web interface and verify that they have been retrieved. By default the interface is available on port 9090

Install the Grafana dashboards

The same GitHub repository contains some Grafana dashboards. Including:

  • A Storage Summary dashboard with all the important PowerStore metrics in a single row. You can duplicate the panels to show all your PowerStore arrays in one place at a high level. The left-most panels include a link to open the relevant “detailed” dashboard for each array
  • A PowerStore details dashboard that contains 22 panels including 18 time series graphs covering the performance aspects of the array
  • A PowerEdge dashboard that exposes data from the Redfish API such as Temperature, Fans, Voltage and Watts
  • A Health dashboard to show only a single Health value for each asset. The template provided in the GitHub repo contains only 2 panels: one for a PowerStore and another one for a PowerEdge. Feel free to duplicate those if you have multiple servers and arrays. You can also take it further by adding your own “Health” metrics for other components on your datacenter, whether hardware or software

If you are familiar with Grafana you can import the JSON files and adjust the panels for your environment. If you want to speed up the process a little bit you could find and replace the IP addresses in the templates with your own ones. I have used the following conventions in the templates:

  • 10.10.10.10 PowerStore exporter
  • 10.10.10.11 PowerEdge exporter
  • 10.10.10.100 Grafana ifself

Additionally you will need to adjust the links so that they point to the unique “id” of your “details dashboards”. In a previous blog post we described how to create links between dashboards.

8 replies »

    • Hi Edek, thanks for the feedback. The behavior of the PowerStore REST API is to return a maximum of 100 items from a given collection in a single. If you want to access volumes beyond the 100 the API call needs to use the “range” parameter in the “header” to specify what items you need. The PowerStore we used in the lab is a test environment with far less than 100 volumes so our code didn’t face that difficulty. I will try to look at that in the future. Thanks.

      Liked by 1 person

      • Hi Alberto,

        I solved this issue by adding the ‘item’ parameter with value 500 to the URL. It should be enough for a while 🙂

        url = baseurl + “/volume?select=id,name,type?limit=500”

        Liked by 1 person

    • Hi Sujit, the most common way of doing that is by one exporter per PowerStore. This could be done on a separate Virtual Machine for example. Then the “prometheus.yml” config file in the server needs to have an additional “job” section under “scrape_configs”, one per exporter. If you have many arrays and you think Virtual Machines are wasteful you could containerize the exporter and have a separate Kubernetes container for each exporter. Hope it helps

      Like

      • Thanks Alberto, for the response. Instead of using multiple virtual machines for each array, as a workaround I have each array using different ports on a single virtual machine.

        Like

  1. Hey,

    First of all, great job! One question is still open. Is it possible to get twenty_sec or even lower metrics from the powerstore? We really need them and atm we are export them by hand….
    Thanks a lot!
    Best regards

    Like

    • Hi Lukas, thanks for the feedback!
      In terms of going lower the granularity, twenty_sec has also been available since the beginning. One thing to watch out for is though, is that the name of some of the metrics change so make sure you use Postman or cURL beforehand to find out the exact name of the metrics you are after. you will find that for example in “Five_Mins” granularity a metric is called “avg_read_iops” whereas with “Twenty_Sec” the same metric is called “read_iops”, ie it is pulling the current value as opposed to an average.
      Additionally, with v3.0 now there is a “Five_Sec” interval available. This interval is supported for all physical inventory objects (appliance, cluster, node, initiator, ports, drive, cache, host, hostgroup). However, for objects that can potentially scale to thousands of objects such as “volumes” there is a limit of 100 objects you can monitor at that granularity to make sure the processor is not distracted from its main function which is performing host I/O. You can change the “metric collection granularity” in the GUI by selecting the volumes you want and clicking “More Actions”
      I have read that there is one more “interval” option that was introduced in 3.0 which is “Best_Available”. I haven’t tested it but I think it will give you 5 secs or 20 secs whichever is available.
      Hope it helps!

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s