The process of Self Service BI or the alchemy of working with data
Hello everyone, I’m Serhiy Bondarev, Director of Data Management and Director of Analytical Solutions at PGK, and today I want to share our experience in building Self Service BI. During the preparation of the material, I specifically did not read any books and articles on this topic, in order to share only my experience gained from various experiments, mostly successful.
In 2022, we designed and built a data platform that includes a warehouse with data delivery mechanisms, a reference information management system, a BI system, including Self Service. Taken together, this should provide our teams with a set of tools needed to develop digital product functionality, product, project, and user analytics. At the same time, an understanding of the high potential of analytics, which is developed by users independently, was formed already at the stage of gathering requirements for the design of the BI platform.
Transportation of goods by rail is a complex process that includes careful planning and coordination of all stages. Our company uses about 150 main indicators of transportation, commercial, financial activities, ensuring the technical condition of the wagon fleet. Operation is a rather complex business, and for effective management, it is necessary to constantly take into account the changing conditions based on timely analytics.
Therefore, in our case, the Self Service solution is not another fashionable trend, but an evolutionary step in the development of IT, due to the development of IT competencies in various divisions of the company. The use of programming languages by our economists, auditors, financiers in their work is a normal practice. This is not a statement of fact about the promotion of our employees, but about the fact that the border between IT and non-IT competences has been blurring for a long time. IT Business Analysts often have domain knowledge on par with functional business experts, while business experts may have Python or SQL at the level of an IT developer. In a modern organization, the separation of IT and business units often occurs not within the boundaries of competences, but within the boundaries of processes.
In the field of working with data, this manifests itself most noticeably and means a paradigm shift from the previously widespread waiting for ordered functionality for months through the “single IT window” to the area of decentralization and the ability to independently create the functionality required for work. It is obvious that a large automation related to the creation of new subject areas, refinements of sources, is created in the course of target projects. But it is quite possible to collect analytics on existing data on your own, this significantly saves IT resources and increases the speed of obtaining effects.
When designing the intended look of our Self Service BI, we were guided by the principle that it is not enough to run the install.exe of some chosen BI, write a multi-page disclaimer that IT is not responsible for anything, and provide access to the environment on the condition that it is signed. That’s not how it works. Regarding the assignment of the data function, its head will in any case be responsible for everything that the business develops and presents to its internal sites, management, other functions. Self Service, from the very beginning of its construction, should not be an autonomous system without rules. Therefore, for us, Self Service is an additional BI environment, with development and access processes and rules adapted to business users. In general, we see Self Service not as an IT tool, but as a process that can be conventionally displayed in the diagram below: we provide all users with a navigator from the developed functionality, who wish to create their own toolkit, teach development, providing information about data and methodological support. And then, for those who have already created a mass toolkit, we provide processes and rules for thinking about and moving to a productive circuit, where the programs will be provided with support and guaranteed productivity of the environment.
By the end of 2023, all the components of this process were developed and the puzzle was completed. As I mentioned, the main concept is to provide developers from among business users with described data using a data map and a register of indicators, to develop their competencies through regular training, and to provide them with development tools. At the same time, we should assume from the very beginning that a number of user commands will make tools that will be shared with other users, that is, a number of functionality will become mass. And this means that all sources in such functionality must be targeted, indicators must be correctly calculated, and the code must be described and documented so that each of the company’s analysts can understand how the final report is formed. We therefore conclude that development processes and rules, including information categorization requirements, should also be developed and communicated to teams early on. Essentially, our requirements for teams that have moved from the sandbox level to the level of mass tool development become the same as what we ask of our IT team.
And the main indicators of the adoption of our process: Over the past year, we trained more than 250 employees on our internal courses. We have about 30 teams of business users working in development, of which six teams create a full-fledged mass toolkit and provide all company employees with analytics on the main production and commercial indicators, freeing up about 330 hours per day of employees’ time, which was previously spent on searching for the necessary data . Thus, it can be argued that the process works and produces measurable results. You can learn about the stack of tools we have chosen from my previous article, I will separately note that we have our own expertise in the development and support of the tools used by us. Along with the obvious benefits, this allows for quick changes based on the demands of our market of team developers.
To ensure the technical basis of the entire BI process, the platform must meet the criteria of scalability, stability and flexibility in management. Therefore, we designed and implemented a BI platform with the following features:
We decided on a multi-node deployment of the system, which, thanks to a set of capabilities, allowed us to deploy several nodes performing different functions and with different levels of service on a single data repository. We currently have Productive, Self Service, and Development environments deployed. At the same time, our configuration allows the addition of other environments for specific tasks, for example, for processing confidential data, personal data of employees, separate branch environments with the necessary capacities for a specific purpose. This design has a number of advantages:
A significant reduction in time and costs for connecting new data sources is ensured. Once a connected data structure becomes available to all environments simultaneously, subject to access delimitation conditions;
Due to the use of uniform data for all environments, incidents associated with data desynchronization, different rules for updating them in different environments are eliminated;
A significant saving of data capacity is ensured;
Data models created within targeted IT projects are available to other teams, including other environments, without additional development;
Significant resource savings are achieved through consolidation and the absence of multiple independent installations, including maintenance, user support, support for migration/synchronization processes.
Actually, the principle of scaling described above also ensures increased stability of the system as a whole due to the possibility of isolating individual environments, because in fact, each environment has separate independent servers (independence is defined precisely in the form of user load distribution while maintaining centralized management and connection to the central server). This solution makes it possible to divide the load generated by users and exclude the possibility of the services of a separate environment (for example, Product) falling due to a human factor, for example, errors in the program code on the development environment.
Working with multiple environments and users invariably entails requirements for increased flexibility in access management. In our case, access can be limited at several levels: access to the system as a whole, access to a separate environment, access to a specific stream, access at the level of data lines. If necessary, there is no problem to add an additional level of access to certain programs in the stream. User and developer rights also depend on the type of environment, threads, and applications they work with.
Thus, with the help of Self Service as a process, in a relatively short period of time we democratized our data, optimized and simplified their collection and analysis, made work with data clear and understandable. In the plans for this year, I see the development of cooperation with teams that produce massive tools, in the validation of sources and calculated indicators used, in the development of massive functionality.
Share your experience implementing Self Service BI in the comments.