The final process we want to discuss is Change Management. This is, without a doubt, one of the most important processes we perform in IT. Additionally, it is the most responsible for maintaining a quality Configuration Management System.
The CMDB and Change Management
It is widely held that nearly 80% of all IT outages can be traced to human error. A Forrester study a few years back supported this theory. And, those human errors exist largely in one very visible, very critical process within the ITIL lifecycle. We fear change. Systems that don’t change don’t break.
As a system admin, when I got a frantic call from a development team demanding we make an emergency change because they had promised the business some feature by the end of the week (and the call usually came in on Friday afternoon), my stomach twisted into a little ball. My head began pounding. These are not exaggerations. I went through so many ‘emergencies’ that were manufactured by over-promising, that I began to manifest physical symptoms for a psychological malady. I hated making changes. It meant I was going to get exactly no sleep for the next 3 days while we rolled back, rolled forward, rolled sideways, trying to get this cool new feature to not bring down an entire service.
Interestingly, a lot of the changes that end up causing outages do not manifest as failures. The new feature, or changed configuration works exactly as it was intended. But it broke something else. Something we had missed. Some system that was dependent on the old configuration to run properly. And those sorts of things don’t often get caught in regression testing and QA. Use cases for the changed service are tested thoroughly. We prove out every possible scenario we can think of in the development, test and staging environments (if we are lucky enough to have all 3), and the awesome stamp comes down, and we push the change to production. The new cornflower blue bar chart showing how amazingly awesome our sales team is shows up right where it is supposed to; it is clickable; you can drill down into infinite levels of detail. Everyone cheers. And then network printing stops working in the Seattle Office. How is that possible?
Well, it turns out that the database server being used for the new feature also housed the print server that everyone in Seattle uses to create daily reports, some of which are hundreds of pages in length (I will not go into how environmentally un-friendly that is, but—well, you know). So, when 50 marketing folks started clicking all over this neat new feature, they bogged down the network connection on that database server, and it caused the print server to crash. Incident Management logged into the DB server and restarted the print server, and five minutes later, it happens again. If only we had known before the change that this server was multi-purposed. How, oh how, could we ever know such esoteric information?
TA-DA! Configuration Management Database. It is perhaps an extreme example of how a seemingly unrelated service could be affected by a production change so thoroughly tested, but it is not far-fetched, especially in smaller to mid-sized firms. We don’t have budget to have a separate database server and print server. And that system is not doing a whole lot—most of the time. CPU usage is low, memory utilization is near nothing. Network activity is pretty flat (except for those 2 hours every afternoon when Seattle prints the New York City phone directory). This seems like the perfect place to drop a database. And it is free. We do not have to buy infrastructure. This is a super idea. And that is not sarcasm. We want to utilize our hardware, especially if it is just sitting idle most of the time, wasting money. If your CMDB is well-maintained, you will know, well in advance of any change to that server, that there are two very disparate services being provided. And you can predict based on your capacity plan—you do have a capacity plan, right? (Again, a discussion for another day)—how much more utilization you can expect when that new feature gets added. And if you can say that the server cannot handle it with a high level of confidence, maybe Marketing will be willing to shell out a couple thousand bucks to buy their own database server.
Now, here’s where we find ourselves with a cart and a horse, and very little direction as to which goes in front. The fact is, Change Management must be pretty mature within the organization (some say CMMI Level 3 minimum), and well-controlled. No ad hoc “executive” exceptions, a good Change Manager, all the trimmings, before your CMDB can be this successful. Why? Well, if you do not have awesome change control, you are not going to keep your CMDB up-to-date with any real accuracy. And an inaccurate CMDB is probably less effective than no CMDB at all.
Configuration Management is tedious. It is hard. It takes a small army of people to make sure it works within the organization. But with high quality change control, it can be achieved. And it CAN bring great value to the organization.
Continue to part five, “Challenges, Tools, and Final Thoughts” >>>