An industrial-scale brain or how to turn a dream into reality? / Hebrew

An industrial-scale brain or how to turn a dream into reality? / Hebrew

In the previous article, we considered different types of neural networks and discussed what tasks can be solved with their help. Now we will consider the task of artificial intelligence from an organizational and technical point of view.

When working on complex projects, a team of developers and data processing specialists are usually involved, who immediately have questions: how to manage the project, jointly develop a machine learning model (Machine Learning model), conduct its testing, how to synchronize the code and the results of experiments? After developing and optimizing the ML model, it becomes necessary to deploy it in an industrial environment. All these problems may seem less interesting than solving the machine learning problem itself, but they are critical to the successful implementation of ML projects.

In this article, we’ll take a detailed look at the life cycle of an ML service from idea to development and implementation, and the tools and principles used at each stage.

Outline of the article

Life cycle and participants of the ML project

Artificial intelligence projects may seem like a new world, but in practice they follow standard IT project stages. At the stages related to machine learning, it is necessary to use specialized tools, which are quite advanced today. The diagram shows the main stages of the IT project, the ML specifics are highlighted in green:

At this stage, the business goal is set.

This is a key feature of ML projects. The results of this stage affect the viability of the entire project, so it stands at the beginning of the project, immediately involves data analysis specialists and allocates special time to study the data.

  • Initiating a project with a Go/No-Go decision is also a standard stage in project management.

  • Formalization of job acceptance requirements and criteria. At this stage:

    • Development and agreement with the customer of functional and non-functional requirements in the form of documents: TOR, PMI, performance requirements, etc.

    • Determination of budget and necessary equipment.

  • The stage of development and testing of ML models and code is the most extensive in terms of resources and time.

  • If the results of the previous stages fulfill the set business requirements, then a decision is made on the industrial deployment of the solution.

  • This is followed by the operation, monitoring and updating of the solution and the ML model.

At different stages of the project, machine learning specialists are involved with different roles. Within the framework of ML projects, several key roles are distinguished, which can be performed by different or the same people or have overlapping responsibilities:

  • Business Stakeholder/Data Science Managers – observe the process in general and control all activities.

  • Machine Learning Engineer / MLOps Professionals

    • Provide technical implementation of training models.

    • They are deployed in industrial operation using CI/CD tools.

    • Monitor and retrain ML models on new data.

  • Data Scientists analyze data and build models from a scientific perspective. Such employees are called differently depending on the area where they work:

    • Deep Learning Engineer – a deep learning engineer.

    • Computer Vision Researcher – computer vision researcher.

    • NLP Scientist – natural language processing scientist.

  • Data Engineers collect and prepare data from a technical point of view:

    • They are developing an infrastructure for working with data.

    • Monitor and maintain data flows and system performance, configure monitoring systems.

Next, we will consider in detail the main stages of the ML project.

Evaluation, analysis and preparation of data

In general, data analysis and preparation consists of the following stages:

Receiving data (Ingestion)

  • Obtaining and enriching data from various sources.

  • Data anonymization – cleaning of personal and business-sensitive data.

  • Data Splitting into data for training, verification and testing.

It is good practice at this stage to always keep the original data unchanged, and experiment with copies of the data.

Exploration and Validation

  • Study of data structure – min, max, average, statistical distribution.

  • Checking data for the correctness of types and formats.

  • Data visualization.

  • Determination of dependencies between attributes (attribute correlation).

At this stage, it is necessary to determine whether any additional data is needed or whether it is possible to move on.

Data Cleaning (Data Cleaning)

  • Data transformation.

  • Filling out passes.

  • Removal of outliers in data.

  • Deleting data that is not relevant to the task being solved.

After data preparation, the code and ML model development stage begins.

Development and testing of ML models

Development of ML models

The development of ML models usually consists of the following steps:

Model Training

  • Feature Engineering, dimensionality reduction, data normalization;

  • Selection of model hyperparameters.

Model Engineering

  • Selection of the appropriate ML-model or combination of models;

  • Development, development, versioning of ML-model code;

  • Quality assessment and selection of the best model;

  • Optimization of model hyperparameters.

Model Evaluation & Testing

  • Verification of the model that it performs business tasks;

  • Testing for verification data.

Model Packaging

Automation of the process of development and testing of ML models (MLOps)

Specialized tools must be used to perform tasks within ML projects. The use of such tools allows to ensure the reproducibility of results at all stages of the life cycle of the ML model:

  • Repeatability – reproducibility of experiments during development

  • Reproducibility – transparent transfer of the ML model to the industrial site

  • Replicability – replication of a common solution

To automate data analysis, tools are used to build pipelines, which allow you to version data and models, and automatically perform the necessary steps. An example of such a system is Data Version Control (DVC), which is called Git for data.

The concept of MLOps appeared to automate the development and deployment of ML models. MLOps tools are used to save the results of ML experiments, model versions, for testing and deploying models. Currently, there are several tools that implement the basic functionality of MLOps:

MLflow platform

Let’s take a closer look at the MLflow platform. It consists of the following main components designed to work with ML projects:

  • MLflow Tracking – monitors and logs the model training process. Stores experimental results, configuration data, and model hyperparameters. Allows you to visualize metrics, compare results and choose the best model option.

  • MLflow Projects – the module is designed to save data, code and all dependencies for the ability to repeat experiments on different platforms.

  • MLflow Models – allows you to save ML-models in standard formats for further deployment in different environments. The most common formats:

MLflow deployment diagrams

There are different schemes for deploying and using MLflow. I will present the most general scheme with a dedicated MLflow Server:

With such a scheme, the process of developing and deploying an ML model looks like this:

  • The ML model is developed and tested on local hardware that is integrated with the MLflow Tracking server.

  • The source codes and data for building the ML model are stored in Git.

  • Implemented ML models are stored in the MLflow Registry.

  • MLflow Models transfers the model to a virtual environment for local deployment or a docker container for deployment in cloud platforms and Kubernetes.

  • With the help of MLflow Deployment toolset, the model is deployed in the industrial environment.

ML-model performance is monitored using special platforms, such as Evidently.

Deployment of ML models

Deployment patterns depending on the type of learning and prediction

The following types of learning and prediction are distinguished:

  • ML-model Training:

    • Offline Training – models are trained on already collected (historical) data, during operation the model remains valid for some time, but then a process of retraining on current data is necessary.

    • Online Training – Constant retraining of the model on new data.

  • ML-model Prediction:

    • Offline (batch prediction) – in this case, pre-generated data is used for prediction.

    • Real-time prediction (On-demand predictions) – re-incoming query data is used for prediction.

Using these types, the following matrix can be constructed:

Its cells contain the names of schemes that implement the necessary functionality:

The cells also contain templates for embedding ML models into industrial systems, which can be used to implement the listed schemes:

  • Model-as-Service

  • Model-as-Dependency

  • Model-on-Demand

  • Precompute serving

Next, I will briefly describe the schemes and templates:

Forecast – Static data in the form of files is usually used to build the model. Prophecy is also carried out on static data. BI systems and data science research work in this mode. The scheme is not intended for operation in industrial systems.

Web service – The most popular scheme of use. In it, the ML model is built on historical data, but the prediction information is taken from the query over time. Retraining of the model on current data can be started periodically or the request itself can start the training process on current data (batch run).

Online learning – The most dynamic scheme. Usually used streaming data when the model must be constantly changing. Retraining may not take place in the industrial system, but in parallel and then, the name “incremental training” is more suitable for this scheme. In such systems, there are risks that poor quality incoming data will degrade the quality of the model.

Automated machine learning – this scheme involves automatic learning, optimization of results and selection of the ML model. The implementation of such systems is often a more difficult task than online learning, because it only requires the user to provide data, and the ML model is automatically selected by the system itself. Usually implemented by large AI providers such as Google or Microsoft.

Patterns for embedding ML models (Model Serving Patterns)

Each prediction and learning scheme can be implemented by different technical templates:

Model-as-Service – The simplest template. The ML-model works as a service to which the application makes requests using the REST API:

Model-as-Dependency – The most straightforward way to use the ML model. When implementing this template, the model is embedded in the application:

Precompute serving – when implementing the template, pre-prepared predictions are used:

Model-on-Demand – the pattern is similar to Model-as-Dependency, which uses a message broker architecture with two components:

  • Message Broker with two queues;

  • Event processor that handles requests.


In the article, we considered the stages of implementing machine learning projects and emphasized the importance of all stages for success. It turned out that the world of artificial intelligence consists not only of Data Science scientists, but also of ML engineers who bring ideas to life. The stages of data analysis and preparation, development and testing of ML models, as well as deployment schemes in an industrial environment were discussed in detail.

Without competent technical implementation and organization of the process, a great idea will remain only an abstraction!

A few links for in-depth study

Related posts