“Artificial Intelligence, deep learning, machine learning — whatever you’re doing—if you don’t understand it , learn it. Because otherwise you’re going to be a dinosaur within 3 years.”
—Mark Cuban, Upfront Summit 2017
Artificial Intelligence for IT Operations, more commonly known as AIOps, is a platform approach that leverages big data and machine learning through custom-built algorithms in order to discover infrastructure, identify anomalies, and in some cases, resolve incidents and eliminate threats. In the current climate of change in technology, the concept is not new. The term was coined by Gartner back in 2017. But in the span of just three years, AIOps has become an incredibly powerful area of advancement that companies of all stripes are scrambling to adopt. There is no reason to doubt that, with companies working increasingly harder to shift culturally and operationally through digital transformation into the next phase of emerging technology, automation will play a crucial role in every industry.
Managing the Ocean of Data through AI and Machine Learning
The problem that IT operations faces today is one of correlation and causation within the veritable ocean of data that is being dumped out by modern systems. While it may be possible for an operational team to see a failure, and perhaps even repair the failure efficiently, the underlying correlation is likely to be missed. And that means that it becomes increasingly unlikely that a root cause for such a failure will ever be discovered.
Monitoring solutions have, for many years, managed to deal with the levels and velocity of data that systems produce, and this has allowed operations teams to manually correlate events and establish a causation link. But that is no longer the case, and this has led to an entire market of tools that will do this for us. AIOps utilizes carefully designed algorithms along with rules and conditions developed by operators to effectively “learn” how to deal with situations as they arise. The value of this advancement quickly becomes obvious in a data-centric operational environment. But it is the extension of this ability into the world of information security that has yet to be truly realized.
Security Information and Event Management (SIEM) Limitations
Cyber security has risen to a level of a mission-critical concept and strategy in nearly every organization on the planet. If a company has not yet established its own security posture and risk appetite, its chances of being compromised are likely near 100%. Couple this with the fact that cyber security is currently at a resource deficit of several hundred thousand practitioners (and growing every year), and it is obvious that the industry—at least for now—belongs to the bad actors. The question for many organizations is not if they will be successfully attacked, but when.
Security information and event management (SIEM) tools today handle a lot of data aggregation, reporting, and pattern matching—preprogrammed actions to take based on specific expected results. However, this does not elevate to the level of true machine learning. Even in its current, fledgling state, AIOps is light years ahead of existing event and monitoring solutions.
This is not to say that machine learning is ready to replace a human security incident response team. There are still too many subjective variables in most security events to completely eliminate human judgment. However, there are several benefits that an AIOps platform can provide that can reduce the number of resources needed to sift through the mountains of log data most environments produce.
One of its key benefits is its ability to speed up response time through automated threat hunting. This means that based on correlation rules your team sets up, an AI tool can quickly identify false positives and even halt suspicious processes long before an investigation even begins. This not only helps with investigation and threat remediation, but it can mitigate damage before the response team swings into action. This is a huge win for any organization because time is almost always of the essence during a security event.
Additionally, a machine learning system acquires threat intelligence over time, making it more and more effective at predictive threat hunting. Because AI is able to consume and correlate large amounts of data in a fraction of the time it would take even the most experienced human threat hunter, the system can stop threats that might take a team days, or even weeks, to discover.
As infrastructures scale, SIEM tools develop security blind spots. These are areas of the infrastructure where logging may not be appropriately robust, or patching has not yet been fully implemented. Further, these blind spots are some of the most attractive attack vectors for external hackers. With most security teams already stretched beyond their resource limits, some blind spots may persist for long periods of time. AIOps platforms can limit—even eliminate in some cases—these so-called blind spots. Aggregation and correlation of data is their primary function, so when new and/or different data begins to pour in, the platform will immediately recognize the change and begin to report anomalies immediately. This ability to detect variances in existing infrastructures is an invaluable skill that really only exists within a system capable of learning and adapting as things change.
Where does AIOps fit into managing increased data needs?
Operations has traditionally handled all the data fed from various infrastructure and applications through standard monitoring and event management tools and a great deal of manual observation. However, digital transformation (DT) has changed the game. As more and more of the strategy of an organization is dependent on digital enablement, the data stored, and ultimately leveraged, to drive operations has expanded beyond the boundaries of traditional reporting and analysis. In fact, the idea of “big data” has transcended the normal monitoring and event management practice of most organizations. Where once we had infrastructure bound by hardware and datacenter technology, now there is software-defined infrastructure. Through virtualization, cloud technologies, and an ever-increasing need for smaller, more dynamic containers for our systems, we have created massively complex, dynamic, and even ephemeral ecosystems where a single microservice may only exist for milliseconds.
The effect of all this digitalization is technology that is nearly impossible to reasonably manage for a team of people. Traditional monitoring and reporting tools and techniques fall short when the systems being monitored are changing in the literal blink of an eye.
Digital Transformation and AIOps: A Brief History
Digital transformation is nothing new. It has been in the boardrooms and business strategy plans of businesses across the spectrum—regardless of size, type, or industry—for years. For a more robust introduction to DX take a look at this great article detailing how digital transformation might take shape within a staid and traditional organization. But how does that relate to AIOps and Security? The simple answer is: big data. At its core, digital transformation is about generating more valuable data. That is, no doubt, an oversimplification, but from a technology standpoint, data is the key to both the realization and success (or failure) of any digital strategy.
Data has become what many believe is the precious resource of the future. The most common parallel drawn is to oil. In fact, over the past several years a great debate has sprouted up about whether big data is the new oil or not. I am more drawn to the parallel drawn by James Bridle in his book, New Dark Age: Technology and the End of the Future, “Data isn’t the new oil—it’s the new nuclear power.” Particularly compelling is the idea that data, unlike oil and much more like nuclear power, is unlimited in quantity. Even more fascinating is the assertion that it is also unlimited in its capacity to do harm. While the book details the political, environmental, and cultural conflicts that have arisen over data, its usage and proliferation, and the consequences of de-privatization of our own personal information, I tend to see the dangers from an information security standpoint. As data becomes harder to manage, while at the same time more critical to daily operations, its protection becomes key to the survival of every organization. We have all become data brokers, and our stock in trade is one of the most difficult commodities to manage effectively.
AIOps: The Future of Cyber Security and Big Data
There is great promise in AIOps across the technology enterprise. The reality today is that AIOps platforms can be applied in many ways to augment overburdened operations teams, providing much needed automation and scale that has not been fully realized before now. It may seem that AIOps is fit only for large environments, but the fact is, organizations of all sizes have the ability to leverage the benefits of this emerging technology. Within the context of security, machine learning is even more crucial, as attacks become more advanced and complex every day. As operations staff shift closer to development, event management necessarily becomes more automated. And every organization will need a solid platform to handle the day-to-day needs of ever-expanding infrastructure. AIOps is the key to successfully responding to and solving today’s big data and cyber security challenges.