A brief overview of TinyML / Habr

A brief overview of TinyML / Habr

TinyML itself means the implementation of ml in low-power microcontrollers and embedded systems. It allows IoT devices to perform data processing and machine learning tasks directly on the device itself, thus minimizing the need for a constant Internet connection or external computing resources. The main goal of TinyML is to make it or simple models available for the smallest devices

Basic principles

Unlike traditional machine learning, which requires sometimes incredible computing power and cloud resources, TinyML is optimized to run on devices with very limited resources, such as microcontrollers. TinyML allows you to implement intelligent functions in the smallest devices imaginable: from smart watches to sensors in agriculture.

Also, TinyML has the ability to run on incredibly low power. This is a huge plus considering that many IoT devices are powered by batteries that need to last as long as possible. For example, smart sensors that collect and analyze data for months or even years on a single battery charge.

The principle of data collection is quite simple: it is carried out using sensors that can be built into any device. This data is then processed directly on the device using pre-trained ml models, allowing for on-the-spot decision making without the need to send data to a server or the cloud.

The differences between TinyML and traditional ml are also that there is a big emphasis on optimization, because there are a lot of different problems: limited memory, very small computing power

Architecture and technologies of TinyML


ARM Cortex-M

ARM Cortex-M microcontrollers are known for their 32-bit RISC architecture. These microcontrollers differ in several architectural innovations:

Cortex-M use the Harvard architecture, separates the instruction and data buses. This separation allows simultaneous access to instructions and data, increasing processing efficiency.

ARMv7 central processor in Cortex-M microcontrollers includes bank of registers, consisting of 16 registers, where the first 13 are general purpose and the last three perform specific functions such as the stack pointer, reference register, and program counter.

Cortex-M4 runs on load-store architecture, which means that data operations require loading registers from memory, processing and then saving back only when necessary.

Cortex-M microcontrollers can contain additional componentssuch as Bit-Band memory for direct bit manipulation, memory protection blocks to control access and privileges to different areas of memory, and Tightly-Coupled Memory for low-latency access to critical data or code. Features such as instruction and data caches and ECC to detect and correct TCM and cache errors are also optional and vary by model.

For example, the Cortex-M4 supports the Thumb-2 instruction set, which combines 16-bit and 32-bit instructions, offering a balance between performance and code density. In addition, the Cortex-M4 introduces digital signal processing capabilities, increasing its usefulness in applications that require signal processing or complex mathematical calculations.


The architecture of RISC-V microcontrollers is designed to be open and modular, allowing for tremendous customization and optimization for specific applications or domains. The main features of the RISC-V architecture include a basic instruction set and a set of standard extensions that add specialized functionality.

A basic set of instructions includes only the necessary operations to provide basic functionality with 32-bit integers or its 64-bit variant RV64I, such as addition, subtraction, bitwise operations, loading and saving data, jumps, and branches. It is represented by 47 instructions and encoded in a 32-bit/

Registry file – the main component of the RISC-V architecture, which provides a set of data storage locations during the execution of instructions. It is organized into a set of integer registers and floating-point registers, depending on the implemented extensions in the processor. Integer registers are used to store and manipulate integer values ​​during instruction execution, including addition, subtraction, multiplication, division, bit manipulation, and comparison operations.

RISC-V includes a set of standard extensions such as:

M-extension adds support for integer multiplication and division instructions.

A-extension provides support for atomic memory operations.

F- and D-extensions add support for single- and double-precision floating-point arithmetic operations, respectively.

C-extension introduces a set of 16-bit compressed instructions that can be used alongside standard 32-bit instructions.


TensorFlow Lite

This framework from Google is a simplified version of TensorFlow Lite, optimized for microcontrollers and other resource-constrained devices. It allows machine learning models to be developed and deployed on very low power devices.

With it, you can convert TensorFlow models to a format compatible with TensorFlow Lite for Microcontrollers and deploy them on the target device.

The framework includes a lightweight interpreter that executes machine learning models directly on the target device, providing fast inference without the need to connect to cloud services.

TensorFlow Lite contains an optimized set of microoperations specifically designed to perform standard machine learning tasks such as packages, activations, and pooling with maximum efficiency on microcontrollers.

The framework comes with model conversion, validation, and debugging tools that simplify the development and testing of machine learning applications on microcontrollers.


#include "tensorflow/lite/micro/all_ops_resolver.h"
#include "tensorflow/lite/micro/micro_error_reporter.h"
#include "tensorflow/lite/micro/micro_interpreter.h"
#include "tensorflow/lite/schema/schema_generated.h"
#include "model.h" // заголовочный файл модели

// переменные для TFLite Micro
constexpr int kTensorArenaSize = 10 * 1024;
uint8_t tensor_arena[kTensorArenaSize];
tflite::MicroErrorReporter micro_error_reporter;

// func для инициализации и выполнения модели
void RunInference() {
  // получение модели
  const tflite::Model* model = ::tflite::GetModel(g_model);
  if (model->version() != TFLITE_SCHEMA_VERSION) {
    TF_LITE_REPORT_ERROR(&micro_error_reporter, "Model version mismatch.");

  // настройка интерпретатора
  tflite::AllOpsResolver resolver;
  tflite::MicroInterpreter interpreter(model, resolver, tensor_arena, kTensorArenaSize, &micro_error_reporter);

  // получение указателя на входной тензор
  TfLiteTensor* input = interpreter.input(0);

   // Инициализация входных данных модели здесь
  // например: Заполнение input->data.f с входными данными

  // выполнение модели
  TfLiteStatus invoke_status = interpreter.Invoke();
  if (invoke_status != kTfLiteOk) {
    TF_LITE_REPORT_ERROR(&micro_error_reporter, "Invoke failed.");

  // получение указателя на выходной тензор
  TfLiteTensor* output = interpreter.output(0);

  // использование данных выходного тензора
  // например: Чтение output->data.f для получения результатов инференции

int main() {
  return 0;

You can read more about the library in the documentation.


MicroTVM is an Apache TVM project designed to optimize and deploy machine learning models on microcontrollers and other resource-constrained devices. MicroTVM automates the process of optimizing models, making them more convenient for inference on tiny devices.


Cube.AI from STMicroelectronics allows you to convert pre-trained neural networks into optimized code for STM32 microcontrollers. This framework simplifies the integration of AI models into embedded systems

Microcontrollers for implementing tiny

Various microcontrollers and platforms are used to implement TinyML-based projects. Supported devices on the international market include ARM Cortex-M based boards such as the Arduino Nano 33 BLE Sense, as well as products based on RISC-V, ESP32 and many others. For example: Raspberry Pi 4, BeagleBone AI, Sony SPRESENSE and Raspberry Pi Pico, which allow you to implement even basic artificial intelligence programs.

In Russia, there are also developments of microcontrollers that support micromachine learning technologies. One of the examples is the MK32AMUR microcontroller from Micron based on the RISC-V architecture. This microcontroller has built-in crypto protection and is intended for use in critical infrastructure and devices with high security requirements. Micron emphasizes the high interest in this product from manufacturers of various equipment.

Real examples

One example is medical mask recognition on an ARM Cortex M7 microcontroller using TensorFlow Lite. The size of the model after quantization was about 138 KB, and the data processing speed on the target platform reached 30 frames per second.

Another device is a gesture recognition device that can be attached to a cane and help visually impaired people navigate their daily lives. The developers used a dataset of gestures to train a ProtoNN model with a classification algorithm, resulting in an accurate and inexpensive solution.

Also, TinyML finds application in autonomous vehicles and self-driving cars, where a closed loop learning based on the TinyCNN model is used to implement online predictions, which allows the model to adapt to real data in real time.

TinyML also improves UAV applications by enabling the implementation of energy-efficient, low-latency, high-computing devices that can act as controllers for these UAVs.

Examples are taken from this source.

TinyML is not yet at its peak, but it is already helping to develop new applications, from medical diagnostic devices to autonomous vehicles.

For those interested in learning more about TinyML, I recommend checking out Pete Warden’s YouTube podcast on getting started with TinyML. Podcast link: tinyML Talks – Pete Warden: Getting started with TinyML.

And within the framework of OTUS courses, you can learn both the basics of classic ML and more hardcore approaches and tools. The full catalog of courses and their programs can be found at the link.

Related posts