When we think about automation use cases we typically think about automated provisioning and software releases. But this is only a portion of what engineers have to deal with. Sooner or later things break, pagers ring (yes, I guess I am old enough to know what a pager is) and the fun starts. Indeed, I am referring to the support side of our jobs. Day 0 is exciting but it is easy to forget that the majority of days in the live of the system come afterwards
In a previous video we have shown how to leverage Ansible and REST API’s to automate the provisioning of Virtual Machines in vSphere and Storage. We did that using ServiceNow as a self-service catalogue to effectively deliver a very agile self-service experience on-prem that matches what public clouds provide. So today’s rhetorical question is can we leverage the same tools to improve support activities? We will explore a few use cases
For the demo we will use CloudIQ. DellEMC CloudIQ is a cloud-based solution that has traditionally focused on proactive monitoring and predictive analytics. Through consolidated telemetry for all kinds of DellEMC infrastructure and advanced algorithms it has helped organizations reduce risk, plan ahead and improve productivity.
However the latest release includes two new pieces of functionality that will fundamentally change how CloudIQ is used: the REST API and Webhooks. CloudIQ was great at producing insights about the infrastructure but now we can build powerful automation workflows to act upon those insights. In this article we will focus on the Webhook functionality. Webhooks are HTTP messages that CloudIQ can deliver to another application with details of events.
If you can’t wait to see it in action here is the video demo. For a more detailed discussion keep reading
The Webhook functionality is available on the “Admin” menu. By default it shows what webhooks have been configured and a list of events that have been sent.
Adding a Webhook is easy. When we click on the “Add Webhook” button a menu slides out from the right side. We need to provide a name for the webhook and the URL that will be receiving the webhook messages. At the bottom we can select what systems to notify about. This allows us to create different targets for different environments, such as dev/production/DR, or perhaps for different product families (PowerStore, PowerMax …) to be handled differently. The “Test Webhook” button is a convenient way for developers to test connectivity
If we click on any of the events that have been already sent it brings up the details as shown in the below screenshot. In the Payload section we see the details of the system that generated the event and details about the event itself such as a description of the problem, the suggested resolution and the health score impact in CloudIQ. The payload has separate sections for “new issues” and for “resolve issues” to better manage the lifecycle of events when used in conjunction with an incident management system
We can also see the “Redeliver” button. This option helps developers test their code by resending an old webhook alert instead of having to artificially create issues in the live environment.
So how can we use the new webhook functionality?
The first use case we will explore on the video demo below is Incident Management. ServiceNow is a popular tool for Service Management but it is also very popular for Incident Management. In this demo we are going to use ServiceNow for that purpose. We have created a small Python application that listens to the incoming webhooks. By looking at their payload we can create incidents easily by interacting with the ServiceNow REST API, as we described in our previous article. We can also close them when the payload indicates the issue has been resolved.
The webhook payload also shows the health score impact in the array. This can be used to set the severity of the new incident. Depending of the source and the type of incident we can assign the incident to the right team to ensure it gets look at as soon as possible
Another use case is automated remediation. If an event requires immediate attention and an automated remediation is possible, we can trigger the remediation when a webhook is received. We can do this in addition to the incident management. In the example in the video an alert gets generated when a file system in a storage array gets full. In this case the workflow automatically :
- opens a new incident in ServiceNow
- extends the file system
- if the file system has successfully extended it updates and closes the incident
For non-urgent events a human confirmation might be preferable. We could notify the relevant operations team in a collaboration tool such as Microsoft Teams. In order to fast track the remediation, the notification card will contain a button to execute the remediation without the need to log in to the element manager. Engineers can then perform the remediation even if they are not at their desks by using the Teams mobile app.
So in summary, CloudIQ has traditionally provided insights about the infrastructure. With the new webhook functionality we can now build powerful automation workflows to act upon those insights
For additional info and tutorials on CloudIQ webhooks and REST API you can visit the Dell Developer portal. Also, please note that the CloudIQ Webhook feature is currently available only to customers who explicitly ask for it. If you want to use this feature, contact the CloudIQ team using the Help / Feedback icon available on the top right corner of the CloudIQ UI
And without further ado, here is the video