Balancing Performance and Efficiency in AI Models with Pruning and Quantization

  • Sidney Jim Seewer

    Student thesis: Master's Thesis

    Abstract

    This thesis explores the optimization of neural networks through the application of pruning and quantization techniques, focusing on a convolutional neural network trained
    on the CIFAR-10 dataset. The increasing complexity of AI models, while improving
    performance across various tasks, has also led to greater computational demands, particularly challenging in resource-constrained environments like mobile devices and edge
    computing platforms. This research evaluates how the model size and computational
    requirements can be reduced, while only minimizing the loss in accuracy, by applying
    model pruning and quantization optimization techniques.
    Two pruning methods, constant sparsity and polynomial decay, were evaluated at
    different sparsity levels (0.5 and 0.8). The polynomial decay method, particularly at
    a sparsity of 0.8, proved effective in reducing model size by up to two-thirds while
    maintaining accuracy close to the baseline model. Great improvements in model efficiency, while maintaining accuracy, have been achieved with Post-Training Quantization
    (PTQ), as well as Quantization-Aware Training (QAT). Combining QAT with PTQ led
    to a nearly 50% reduction in inference time and a model size of less than one tenth of
    the original model, with minimal impact on accuracy.
    The results of applying these optimization techniques shows that there are possibilities for deploying efficient AI models on to resource-limited devices. This research
    contributes to ongoing efforts in AI model optimization, providing valuable insights
    into balancing performance with efficiency. Future work will explore the scalability of
    these techniques to more complex models and real-world deployment scenarios, aiming
    to further enhance the practicality and accessibility of optimized AI technologies.
    Date of Award2024
    Original languageEnglish (American)
    SupervisorJosef Langer (Supervisor)

    Cite this

    '