IT Focus Area: infrastructure operations
July 28, 2012
Automating the Mundane
It is hard to imagine a world without automation. From the industrial revolution to the explosion of information technology (IT), automation has touched literally every industry, enabling us to work faster, more efficiently and with increased productivity. Today, automation has become such an integral part of business for many companies in many different industries.
IT is the key to modern automation, but ironically, IT operations have not benefited greatly from using automation with their own processes. More than half of the employees in IT departments are managing and supporting established infrastructure and applications, rather than planning and developing new solutions. As a result, many IT departments never have the opportunity to become centers of innovation.
Although most IT operations’ tasks are quite mundane, they typically take up most of the IT resources, especially if outsourced vendor maintenance services support is included. The reason is that even the mundane tasks require some technical expertise, and they unpredictably (and often) require rapid intervention by highly-skilled engineers. Tier structures help optimize the use of highly skilled labor, but still, valuable engineering and management expertise is consumed by addressing common, recurring issues. In addition to being an unproductive use of resources, the mundane nature of operational tasks and the need for around-the-clock and off-hours coverage is a source of dissatisfaction for many skilled IT employees.
Chief information officers (CIOs), clearly would rather have most of their IT talent focused on strategy, innovation and agility, instead of repetitive operational tasks. This is why deploying the right automation solution can be a strategic initiative instead of simply a cost-saving measure. Ultimately, it is about turning an IT department into a proactive strategic partner for the business rather than a reactive daily order taker.
The State of IT Automation Today
IT automation, especially in its simple forms like scripts and macros, has been around for years. However, it has not been subjected to rigorous analysis or classification. IT executives who are now exploring various approaches to achieving the benefits of automation are finding that there is no mature framework to articulate requirements, compare solutions or develop a roadmap.
This article is a first step toward presenting a framework to help IT develop requirements and create a roadmap for automation.
Automating IT operations requires a level of standardization and two basic capabilities: decision-making and executing. Executing specific tasks is actually easy and can be done using macros, scripts, and various programmed applications. Deciding what to do, on the other hand, is hard.
IT event resolution typically requires long sequences of decisions and tasks in bewildering combinations that are essentially un-documented and large decision trees. For example, a programmer who wants to develop a program to resolve a database memory leak first has to learn the steps from several operations experts who can cover the many possibilities. Then, he or she will have to hardcode a huge combination of sequences of actions (such as configuration lookups or diagnostics) that generate mostly intermediate results, followed by decisions to evaluate those results to determine the next action, and so on. The many actions and the decisions to evaluate them make up a decision tree. Navigating decision trees really requires associative recognition, not linear processing. The result is that automating IT decision trees using standard programming is almost impossible. So, whereas automating specific tasks using programming or scripting is easy, automating an entire event resolution sequence is prohibitively expensive.
An automation engine is an IT system that can perform these two capabilities (decision-making and execution) in order to manage other IT systems. IT automation engines still require human oversight. It is the degree and nature of that oversight that determine the benefits of automation and the productivity of the operator.
Unlike manufacturing operations, most of the decisions in IT operations, along with the research required to make them, must often be done in real time. Historically, automation engines have relied heavily on people to make almost all major decisions. As it turns out, that means people still do most of the IT operations work, because researching events, analyzing them and deciding on next steps constitute a huge portion of the work required, even for mundane tasks. The result is that traditional automation engines, and whatever management suites they are part of, present the human operator with a “pane of glass” to control IT assets, and a mechanism to execute. But it is the operator who has the knowledge base, and it is the operator who must take action.
A new generation of automation engines is now available. At their core, the new engines have expert systems that can handle the IT event resolution decision trees. Unlike automation systems that rely on traditional programming, expert systems use a knowledge base of learned past events to make decisions. No programming is required. Such an engine essentially uses pattern recognition to evaluate a situation and decide on the next action, based on what actions worked in the past in identical situations. They escalate to a human operator only if they cannot make a match with a high degree of certainty.
With such an engine, a large portion of the work can now be automated and with a higher degree of reliability than with the traditional mode of operation. Unlike most people, who get bored and are likely to make mistakes when performing repetitive tasks, expert systems do not get bored. New events are “learned” from the IT human operators, not programmed by software programmers twice removed from the scene. This also vastly reduces the lag between the emergence of a new automation need and its availability. This is important because IT technology and IT policies change rapidly over time and from company to company.
The term “orchestrators” is now a popular way of describing automation engines that interface with several IT systems simultaneously. Of course, decision trees that are required to resolve general events across multiple systems are even more complicated than events that touch only one. So, orchestrators with traditional automation engine cores are usually only used for pre-defined tasks, such as provisioning or resource management (for example, defining a new virtual machine with the required hardware resources, network access and storage), where they yield significant labor and time reduction. They are not, however, game changers when it comes to day-to-day operations for the reasons explained above. But orchestrators with an expert system core can be.
How to Get Started with Automation
So, how does an IT organization get started with automation? Let’s first look at the way that IT tasks are usually classified. To reduce the work on the more skilled operators, each event or request passes through up to (typically) three tiers of support. The first tier is used usually to filter events using operations scripts or runbooks. Simple events are resolved, and the more difficult ones are then passed to tier two.
Tier one usually does not involve any engineering. Tier two requires some engineering, and is usually where most operations resources are spent. Tier three requires deep knowledge, and human thinking, deduction and creativity. So, tier-three operations are typically not good candidates for automation. However, collecting the information needed to do this work can usually be automated, and will therefore increase engineers’ productivity and job satisfaction.
In the context of your long-term business plan, identify the areas of IT operations that consume a majority of your resources across all tiers. Start with operations work items that are concentrated in tiers one and two. These are the primary targets for automation. Use the automation framework to determine the requirements that the automation solution should meet.
Then, decide on how to achieve your automation objectives. Automation, as with operations in general, can be achieved through either a tool or a service. The IT organization may deploy tools to help automate its operations or may outsource certain operations to managed service providers (MSPs) who in turn manage the target assets automatically. In most cases, the choice between tools and services is usually a tradeoff between capital and operational expenses.
Tools that have high long-term value usually require a substantial initial investment and intense management focus to deploy them and to adjust the business processes around them. Without those process adjustments, the value of such tools is severely diluted. On the other hand, if done right, deploying the right tools will generate a very good return on investment.
For enterprises that do not want to make the initial investment or retool their operations and management processes, going the services route is the obvious choice. Managed service providers with the right capabilities already have the tools and expertise and can infuse an organization with expertise (and a knowledge base) gleaned from many other clients.
Using automated systems enables an IT organization to focus on the core business of its company, not on operating what is already there. Modern automation technology can now enable shifting of IT resources toward innovation and growth. It makes it possible to achieve many of the cloud benefits without going to the cloud. And for companies that want to migrate to the cloud, it enables uniformity of operations regardless of where the resources are housed or who owns them. At the same time, it also cuts costs and increases quality.