A systematic approach to working with data – the experience of PGK

A systematic approach to working with data – the experience of PGK

Hello, Habre! My name is Serhiy Bondarev, I am the CDO of the First Cargo Company. Today I will tell you how we solve data management tasks from the point of view of IT development and the methodologies used.

First, about who we are

We at PGK transport goods — steel, coal, food products, machinery and equipment — in Russia and abroad. About 100,000 wagons are under our management. Managing a large fleet of cars is not a trivial task. It is necessary to monitor the location of the wagons, their technical condition, to work out logistical aspects, for example, to draw up a schedule for the transfer of trains to intermediate junction points.

To solve these tasks, we develop our own digital services and implement the full cycle of development of software products. Yes, we have predictive analytics systems in our arsenal that allow us to send wagons for repairs before the actual occurrence of a malfunction or to forecast the demand for services using machine learning.

It is obvious that the basis of every analytical product is data. Their sources are very diverse. They come from the rail infrastructure owner, our transaction systems and other digital products. In addition to them, there are accumulated historical data and various regulatory and reference information. Next, I’ll tell you how we manage it.

What we mean by data governance

A few words about what we mean by data management. This is the management of their supply and storage, as well as the control of their quality, distribution and use. The need for systematic data management depends on several factors. First, from the volume of data that the organization operates. Secondly, from the number of employees who work with them. Third, from the current cultural level of the organization in the use of data and the company’s strategic goals.

The Gartner model defines five levels of organizational maturity. The maximum benefit of working with data occurs at the last levels. Then advanced analytics create value for the company and help it earn. Even a strong team of mathematicians/analysts will not be able to build a working predictive model of demand, if the accumulation of the history of facts is not established, and in each digital system, different algorithms for calculating indicators are implemented.

During the preparation of the IT strategy in 2020, a maturity assessment of our company was conducted based on the DAMA DMBoK. The assessment revealed that we are stable at the second level of maturity. Therefore, my strategic task is to consistently bring the company closer to the “Manageable” level. We already cover the entire value chain with advanced analytics through various digital products. Use of these types of products is an indication of the Managed Tier. And to ensure the work of our analytics, all other processes in the company as a whole must consistently reach a higher level. This is ensured by systematic measures in the technological, process and cultural spheres.

What does system data management provide?

First, reducing the cost and term of development of digital products and IT projects due to the use of a single platform, the formation and development of target competencies, and the use of reliable, supported data sources.

Secondly, data management also opens the company wide opportunities for independent development of tools by analysts of business units. Everyone begins to understand where to get the necessary data to solve problems, where to go to get the knowledge necessary to work with them. This means that business unit analysts can use their full potential to get answers to questions without having to go to IT.

How we approach data management

Four key components can be identified in data management processes: IT component, directly management processes, operating model and methodology. You can highlight separately competence managementin modern conditions, the formation and development of internal expertise are necessary conditions for development.

IT platform

Tools for data collection, calculation of necessary indicators and transmission to consumer systems are necessary for data management. The composition, technologies, and size of each tool vary depending on the volume and types of data the organization handles, the main consumers, and the speed of achieving business results. For example, for small amounts of information, it is not necessary to build systems with a massively parallel architecture.

On the other hand, a large organization may have several subject-oriented repositories built on different technologies. Moreover, the very fact of their presence is not a reason for unification by means of refactoring. Thus, PGK has historically used two storage facilities based on two different technologies. One ensures the operation of key business processes, the other contains operational indicators for operations with rolling stock.

Our digital strategy involves developing a data platform on top of existing storage. We have chosen proven and lightweight components that we will continuously develop and scale: multi-node Apache Nifi for data delivery and exchange, a fault-tolerant PostgreSQL cluster based on a commercial build for storage, with a control mechanism of our own development. We use Microsoft MDS as an RDM system, and the Qlik Sense cluster implements products and Self Service BI.

As a data map, we use a solution based on Confluence and our own metadata management module. We provide implementation and support with internal expertise, which was formed simultaneously with the launch of the project. The main consumers and developers on the platform are teams building IT analytical tools and product teams building digital products.

Management processes

To the main groups of processes, I include management of metadata, reference information, processes of development of analytical products and independent user development.

In order for our employees from business units to be more actively involved in user development, together with HR, we launched a training program with authored courses on initial and advanced BI development. In addition to development competencies and data knowledge, colleagues learn to understand how the functionality they create can be industrialized and published for widespread use. In this regard, we have described all relevant requirements and standards and made them available for study.

We are not limited to one training – to further support colleagues from business units, we have organized a Telegram channel for sharing experience and assistance from IT developers. Since the beginning of this year, we have trained about 200 of our employees.

Standardization of development, creation of target groups, expansion of the competencies of our specialists, including with the aim of increasing the general cultural level of working with data, are not one-time activities, but a continuous process of developing cultural perception. Ensuring this is one of the key tasks of the CDO.

Operating model

The larger the organization, the more logical it is to apply a data processing model with reasonable decentralization. It is quite difficult to find a large company that uses a single repository built on a single technology, with processes closed to a centralized team – all this will effectively work only on presentation sheets. In life, this will inevitably lead to a bottleneck on the IT side.

In fact, it will be impossible to launch new products in such conditions – as practice shows, effective products require teams that work according to an agile methodology with minimal external dependencies. But simultaneously with the independence of the teams, it is important to ensure the standardization of data management processes, mechanisms of their control and support. Ultimately, this will help to spend less resources on product development and allow you to launch new functionality faster.

For example, we at PGK have centralized only basic operations that directly affect the integrity of data and ensuring the continuity of digital tools. Operations of development, management of metadata, maintenance of directories are decentralized. The concept of the decentralized model can be expressed as follows: employees with specialized data management competencies are allocated as a resource to independent project and product teams, and develop the functionality required for products and projects on the platform. They adhere to uniform principles, standards of development and documentation. The platform (central) team ensures the continuity of services, connection of sources, control of code during output to products, selection of specialists and development of their competencies, training. In total, we have seven product teams working according to this model, and we will increase them later.

As for reference management processes, data card maintenance and support, we have scaled this across all teams and reference owners. The owners of the main data and their administrators on the IT side have been determined. Editable directories are modified by their owners in self-service mode, the data map is updated by teams as new data sources emerge and new data showcases are developed.

I designed the logic of the data map based on the goal that each employee, regardless of the department, would have the opportunity to find a description of the data he needs in his work, including a description of the source, showcase, their owners, properties: depth, volume, formation regulations, description of metadata. Thus, the card and register of data sources should provide the user with basic information, and the owner of the data or indicator – only clarifying address advice.

Methodology

It is always difficult to launch and inculcate a process. Issues of determining data owners, directories, target data sources arise regularly. In the context of dozens of teams, they become very relevant.

Therefore, an important part of the data management system is the methodological committee. This is a collegial body, which includes the main experts – the company’s methodologists, IT representatives, and is empowered to draw up a methodology for calculating indicators, maintain their register, and resolve controversial issues that are regularly raised by representatives of project and product teams. Our company has a very strong selection of Committee members. This is a real working body that we launched at the end of last year.

What are our results?

In IT, we also have a number of specialized sites. The first is where teams and business users address questions related to data quality, and the second is where questions related to single directories come. In this way, the set of measures provides the digital and data business unit teams with maximum technical and methodological support. As a result of business units, tools are created that are valuable not only for a specific team, but also for the entire company (that is, they claim mass adoption). For this, we have established “industry” rules: we check the tools for compliance with development requirements and compliance with documentation standards, after which we build a support model, bring it into products, and the created tools become available to a wide range of internal users.

As a result of all the work done, in one and a half years since the deployment of the platform, standards and processes, BI-analytics has taken root in all the structural units of the company, about 62% of the entire target audience of employees regularly use it in their activities. This creates a culture of working with data, creating a demand for new knowledge and skills among employees. We currently have about 40 business unit teams building analytics functionality; they all work with a single platform based on Qlik Sense.

In the next article, I will tell you how we built the Self Service BI model, how it develops and evolves.

Related posts