How we made support more transparent for business with a service approach
Hello, Habre! My name is Dima, I manage the information technology department of the back office at Petrovych Tech. We develop IT solutions for Petrovych, a large federal retailer in the DIY and building products segment.
Over the past few years, the company has grown significantly (in sales – 10 times in 10 years, up to 119 billion rubles without VAT in 2022). Requirements for reliability and stability of IT – grew accordingly.
Probably, each of you has faced a situation when the IT service was unavailable at the wrong time. Not to go too far with examples: this happens even with common “virtually industry standard” boxed solutions like JIRA or Confluence.
When this happens, it’s tempting to think “what’s the big deal, just keep the server running”. For a business, almost any IT process can look roughly the same — a kind of black box that sometimes works and sometimes doesn’t.
We understand that there can be hundreds of factors, due to which everything does not go according to plan – both in the development and in the support of IT services. In order for the business to also understand this and build plans taking into account this knowledge, a certain general context, an agreement on the level of services is needed. If you agree on such things, everyone can win: business colleagues will have a better understanding of what is happening; IT people are more relaxed about the commitments they have made.
In this article, I will tell you how we agreed on the service level with the business and how we implement it in practice for the stage of operation, support and support (about similar things in development, for creating new functionality – next time).
I hope my story will be interesting for those who have faced the difficulties of scaling and improving processes at the interface of IT and the operational part of the company.
What exactly is the problem with opacity
I will often use the word “IT service” below to refer to any service that provides information technology to business units.
For example, 1C helps to process customer orders, an IT barrier management system helps to limit access to the warehouse – both of these will fall under the definition of IT service in the context of this article.
Now to the problem: the business cannot always assess whether we in IT are doing something well or badly. This is a fairly common situation for industry in general. If you look at the experience of colleagues from different companies, you can find that many do not always have enough metrics to help evaluate the quality of IT services. Moreover, in practice it happens that the quality is determined by the level of tension of the staff.
The second problem is that IT does not know what the business wants. The company’s goals may be generally clear, but what exactly colleagues expect is far from always being recorded in an obvious way.
As a result, the business has inflated and unrealistic expectations (“our services must have 100% availability and not a minute of downtime!”), and IT colleagues live in a situation where impossible results are expected of them, they are constantly dissatisfied with them.
It would seem that everything is clear here, what is it all about? You just need to agree, get quality requirements from the business, and a realistic estimate from IT and the ability to meet these requirements. Conceptually – yes, but in reality – many non-obvious nuances arise. Let’s figure it out together.
So, at the start of the described story, we wanted to get such an agreement between business and IT, when processes are safe, customers are satisfied with the level of availability, and the IT team knows exactly what to do to ensure all this, and has all the necessary capabilities for this.
In the literature, this is called the “service level management process”. At the initial stage, you need to do two things – determine the target level and the actual level. And then you can compare the requirements and the current state of affairs; form a plan to achieve the desired indicators.
In our case, the scale of the company no longer allowed us to make such changes without prior preparation, so it was decided to start with a pilot project.
Our company has the concept of “building shopping center” (hereinafter simply STC) – this is the main business unit in retail. The Contact Center (CC) also plays a very important role in the company. We decided to pilot the project at one STC and CC.
The plan was as follows:
We carefully examine our IT services, which work with STC and CC. We fix the list of used, rank the importance for business.
Together with colleagues from the operational part of the company, we agree on the level of service — for example, the response time to an incident and the time to solve a problem.
In the process, we refine and implement control tools – for example, the balanced scorecard system (BSC), where you can monitor the dynamics and fulfillment of requirements.
We collect reports for the test period and analyze them – we find deviations from the target level, find out the reasons, and look for ways to prevent deviations in the future. We repeat this step the required number of times until the desired result is obtained.
In the final, we develop a plan on how to further implement all this throughout the company.
Looking ahead, I will say that the pilot was successful, many points of growth were discovered, and we decided to spread the approach to the entire organization.
Now to the details.
Let’s look at examples
To understand what to measure and what targets are realistic, let’s take a look at our catalog in support. We have services (for example, “Printing service”), they are divided into components (“Connecting and setting up printing equipment”, “Problems with office equipment”) and IT functions (“Printer”, “MFP”).
For STC, the catalog consists of 17 IT services, which are divided into 33 components, which, in turn, are divided into 33 IT functions.
Also, in the catalog, we record the service delivery schedule, response time, and problem resolution time. At the start, it was a “historical” time.
At the time of the launch of the pilot project, we agreed on a target value for all services at the STC – 80%. That is, the reaction time must exceed the specified indicator by more than 20% (in each case – its own value).
At first, there was a difficulty with expectations from the business: at the start, it was necessary to explain “why not all IT problems in 5 minutes.” The “show the real situation” approach helped us.
For example, some STCs have one IT specialist, and he physically cannot solve all problems instantly. In fact, in a dialogue with the head of the pilot STC, it turned out that the solution time is not as important as the reaction time – the business understood that if we take on the task, we will do it (sooner or later). But if we don’t take it, it can become a problem. Therefore, the KPI here is the response time to the incident.
For example, there is the IT function “Wired network” – this is a local network for employees and terminals. The target response time to the problem there is a little more than 30 minutes, and the solution time is a little more than 4 hours. All this from Monday to Friday, from 8:00 a.m. to 5:00 p.m.
Where did these numbers come from? A business would certainly like to have 0% downtime for any equipment, but these are physical devices and sometimes they break, shut down. The time from incident detection to response time should be less than 30 minutes — this is a realistic indicator, at that point we were already doing that. But the indicator “time for a decision” is already up to 4 hours. This is not much, if you consider that sometimes a significant amount of time is spent to find the cause, network equipment can fail and have to be physically replaced.
During the discussion of the target indicators, we found out a lot of useful things with the business. For example, some tasks require more resources (hire people, increase capacity, purchase additional equipment). A good part of the issues were resolved due to such scaling.
As the final stage in the pilot project, we appointed those responsible for the target indicators in IT.
Pilot completed, what’s next
At first, we tried the approach at the pilot STCs and in the city center, then we extended it to all STCs in Moscow, now we are expanding to St. Petersburg and the North-West Federal District. We have noticeably more involved people — STC managers are joining the process, joining the dialogue, helping to develop the system. Currently, the service level at the STC is 94%.
We enlisted the support of the business and all of IT; were able to collect and systematize a lot of data on problem areas in support and assistance. We now have actual (actual) IT support performance results, targets for availability, response time, and problem resolution have been agreed upon with our operations colleagues.
All this directly affects two important things: the quality of services and the motivation of employees. With the first, everything is simple: since we now more carefully calculate the indicators and align them with the business, everything has become more transparent and understandable, due to this, we know what and where to improve.
Motivation is more complicated, but also understandable: now, instead of “aah, everything is on fire” and unrealistic expectations, we have a clear and structured system, where there are clear priorities, fixed agreements; Actually, there is a regulation – take and do.
The moral of the story: in any unclear situation, you need to negotiate. Business wants unrealistic results from us – let’s negotiate. They expect that everything will be ready yesterday – we are going to negotiate again. They don’t hear the arguments, we can’t come to an agreement – we’re going to negotiate anyway. The sooner the better. The more specific the agreements, the easier it will be to live with them later.
May everything be successful with your arrangements!
Tell us in the comments, and how is your support in terms of metrics – how do you work with SLA, KPI and other scary words?