Retraining of Neural Networks on Resource-Limited Devices

  • Florian David Meißl

    Student thesis: Master's Thesis

    Abstract

    Machine vision drives efficiency and automation in many industries. Since 2012, convolutional neural networks have taken over the leading role in this field of research.
    Key drivers for the quick advancements of neural networks in computer vision tasks are
    Moore’s law and the rapid development of algorithms for artificial intelligence (AI).
    Quality control tasks at assembly lines or applications for security and surveillance
    are often built on resource-limited embedded devices. Requirements like data privacy
    or real-time demands necessitate local data processing. However, neural networks place
    high demands on a system’s hardware. Thus, algorithm-specific and hardware-specific
    optimizations are mandatory to enable AI on resource-limited devices. Especially use
    cases in which retraining of the AI is required for fine-tuning or adaptation to environmental changes, places high demands on the hardware. Therefore, optimizations for
    the back-propagation algorithm, as well as optimal hardware occupancy, are essential
    to enable the training of neural networks on embedded devices.
    This thesis compares three state-of-the-art machine learning (ML) frameworks regarding their computational performance and memory footprint. The three ML frameworks, namely PyTorch, ONNX Runtime, and TensorRT, were utilized to retrain the
    VGG16 vision model on the CIFAR-10 dataset. Since TensorRT is a highly optimized
    inference framework, its Network Definition API was used to implement a training
    model. Measurements on the CPU and GPU of an NVIDIA Jetson Orin utilizing performance optimization techniques like layer freezing and reduced floating-point precision
    were carried out.
    The experiments conducted show that PyTorch is highly efficient in training convolutional neural networks on NVIDIA GPUs. Further, it is evidenced that TensorRT
    cannot well optimize networks where the model parameters are inputs to the network.
    Optimizations like layer freezing and reduced floating-point precision can speed up the
    training process by nearly a factor of three.
    Date of Award2024
    Original languageEnglish (American)
    SupervisorJosef Langer (Supervisor) & Philipp Knaack (Supervisor)

    Cite this

    '