This thesis examines distributed training and the impact of asymmetric hardware at the microcontroller level. For this purpose, the Neural Microcontroller Model (NMCM) system was developed, enabling the execution and training of a PyTorch-based AI model on one or multiple microcontrollers. The NMCM system consists of two components. The first is a parser implemented in Python that converts the trained AI model into a binary format and generates corresponding header files that contain general information about the model. The second component is the NMCM framework, written in C++, which manages both the execution and training of the model on the microcontroller. The current version of the NMCM framework supports six types of layers: linear, convolution, Rectified Linear Unit (ReLU), flatten, softmax, and MaxPool2d. For distributed training, the framework employs the concept of model parallelism to divide the training process across multiple devices. Data communication between the different microcontrollers during distributed training is implemented using the Transmission Control Protocol (TCP) and the Internet Protocol (IP). However, since the microcontrollers used do not natively support these protocols, each microcontroller is equipped with a W5500 module for communication. This module enables socket-based connections between devices and is controlled by the respective microcontroller via a Serial Peripheral Interface (SPI). The influence of asymmetric hardware was investigated using two AI models of different sizes. Each test run involves two microcontrollers that train the model in a distributed manner. The microcontrollers used were the STM32F413ZHT6 and the more powerful STM32H7A3ZIT6Q. Tests were conducted using every permutation of these two microcontrollers to analyse the impact on training time as well as SRAM and flash memory consumption.
Kollaboratives Training von Modellen des maschinellen Lernens in heterogenen Mikrocontrollernetzwerken
Köhler, M. (Author). 2025
Student thesis: Master's Thesis