This post is part of Operationalize Your World post. Do read it first to get the context.
CIO, Head of Global Infrastructure, and other IT Senior Management have a different requirements for dashboard. Generally they want:
- big picture, not details.
- exception. Things that they need their attention.
- less technical info. Ideally, present in business terms, not IT.
- a portal that is easy to access. They may not want to login to vR Ops. If they do, they may forget password. [e1: vR Ops 6.5 cannot do login-less yet]
- UI that is easy to understand. So keep each dashboard to a specific question.
- system that is easy to use. So keep the interaction, clicking, zooming, sorting, etc. minimal.
That’s what they want from you.
What do you want from them?
You show them something so you can get help (e.g. budget, resource). Here are some goals:
- Show transparency. Giving visibility into live environment to senior management.
- Prove that you do need additional hardware.
- Prove the wastage you have been talking for months.
What do you not want to show? There are things you do not want to show. Urgent issues are something that you should not display. It is not about hiding information to CIO. This is about giving you the time or space to do your job. If there is an active fire that requires your full time concentration, you do not want to be interrupted by CIO asking why it’s showing red on the dashboard!
I covered dashboard best practice inthis post. Read that first , as this blog builds upon that.
We take the same approach we did when planning dashboards for specific roles (e.g.Storageteam,Networkteam). We ask a set of questions.
If we implement the above, we will end up with at least 5 dashboards. I’ve combined some of them. I see a wide variety of requirements, so you will customise them anyway ��
- How many VMs in our cloud? What’s their CPU, RAM, Disk allocation? This gives you a size of the environment the IaaS is supporting.
- How much CPU, RAM, Disk do we have? Is it enough to support the above requirement?
- You should also give the history of VM growth. What is enough today may not be enough in 3 months.
In the dashboard above, I’ve added Availability information. As VMs can be powered off intentionally by application team, you should only report for Tier 1 VMs. Tier 3 VMs, especially those in Test and Dev, can be rebooted frequently and hence will give misleading information.
The dashboard below shows all VMs. In a large environment, the heat map will automatically combine VMs with the same value (read: color & size).
Every VM is represented by a box. The box can take on value between 0 and 3.
- Green = 0. The VM is served well.
- Yellow = 1. One of the IaaS is not delivered as perPerformance SLA. We track CPU, RAM and Disk. If your SLA states 10 ms disk latency, then the VM has to get 10 ms.
- Orange = 2. Two of the IaaS is not delivered.
- Red = 3. All 3 services not delivered.
The VMs are grouped by Datacenter, then cluster. This lets you see which Datacenter or Cluster aren’t coping well.
The above shows the VMs. What about applications ? An Application spans multiple tiers and multiple VMs. Just because a VM does not perform does not mean the whole application is affected. As this is for Senior Management, we’re only showing the Tier 1 applications.
This blog explains theimplementation.
- CIO is not in charge of capacity management. He just need to know the decision you want him to make (which is to approve hardware purchase). For that, he needs to know if you are running out of capacity, and existing capacity is not wasted.
- How is it growing? This can be taken care of by having a projection. This projection should take into account committed projects too.
[e1: I will add details here, as the dashboards I have is showing too much detail]
- Do we have “bad” configuration? Examples are old & unsupported versions of Windows, Linux, ESXi, VMware Tools, etc.
- How uniform is our environment? Complexity is required to optimize cost (hardware, software) and performance. However, there is cost in complexity.
If your CIO does not appreciate the complexity, showing CIO the complexity is good for you. It will result in appreciation of your expertise & effort, as it’s certainly easier if the complexity is low. Complexity increases when you have a wide variety of things.
Factor impacting complexity:
- No of ESXi versions. The more variants, the more complex.
- No of ESXi CPU version
- No of brand. The more vendor, the more complex as you need to learn them, and spend time with the their team.
- No of cluster node size
- No of shared Datastore size
See the examples providedherefor Infrastructure andherefor VM.