Finding ROI in a Configuration Management Database (Part 3)

Written by Mark Hillyard

The second process that directly benefits from a well-maintained CMDB is Problem Management. Here, we’ll focus on the specific wins we can expect to see in this much-maligned process.

The CMDB and Problem Management

The process we all love to hate. We all want to do it, but we are all so buried in firefighting and break-fix that, well, who has the time? Some shops, and I include some of my own previous employers among them, tend to treat problem management somewhere between major incident management and finger-pointing 101. I cannot really explain why this is, but it definitely is not what is intended by ITIL best practice. Problem Management is meant to be a very proactive, enhancing, and ultimately cost-saving process that should be ongoing for the entire environment. Unfortunately, for reasons stated, it gets relegated to the dustbin of, “Well, we don’t know why it broke, but someone is to blame for it!”

So, how do we reverse this trend, and why can a CMDB help? Well, the answer to the first question is very dependent on an organization’s particular situation, but I can say that if you start approaching Problems more as research exercises, rather than trying to figure out who blew it on that last deployment, you may find yourself fixing things before there are incidents—and that is how it should work.

As for your CMDB and Problem Management, think of it this way: when a service goes down, especially a tier 1 or tier 2 type service (say, e-mail, just to keep my examples consistent), there is a flurry of activity. Incident Management swings into action, getting things back up with duct tape and bubble gum if necessary. Then, when everything quiets down, management wants some sort of After-Action report, perhaps a meeting on what happened, why it all went plaid. So, we call up our Problem Manager (probably the only member of the Problem Management team) and make him or her write up metrics and monitoring reports, get it together so we can sit down 14 hours after the outage to discuss what went wrong. And maybe we get it right. Maybe we can say, with great certainty, why the service failed. But often, this is just an educated guess, based on previous outages, and related changes, and a healthy dose of wordsmithing.

Now, enter the CMDB, with its ridiculous amount of data at our fingertips. If we are keeping it up to date, maintaining system baselines, updating CIs after changes, etc., we suddenly have, in a completely relational and meaningful package, all of the components that make up the service (hmm…traceability, anyone?). Now your overworked Problem Manager can see, quite easily, where things went poorly. And, not only that, we can use this as a predictor of future performance.  So, not only can the Problem Manager appease management with cold, hard facts about the systems involved, he or she can suggest a real solution based on the available data. And all of this because we maintained our CMDB properly. Instead of promising the Marketing VP that it will never happen again in the middle of one of her amazing demos, we can say, with great certainty, that we know exactly where things went badly, and precisely what will ensure it doesn’t go that way again.

Now is the CMDB some sort of wizard that can conjure solutions for you? Of course not, but having well-documented, highly integrated data at your disposal can go a long way to creating a positive approach to root cause analysis.

Additionally, with the proper tools in place, we can even create visual references of how CIs interact with one another, including changes and requests associated with each component. A good visualization tool can even show components that are currently down, and how that is impacting components and services around them.

So, really, building and maintaining a great CMDB not only tells you what you have, but how to better manage it to avoid incidents down the road.

Onward to part four, “The CMDB and Change Management” >>>


Originally published April 04 2013, updated January 01 2019