Abstract
This thesis explores the optimization of neural networks through the application of pruning and quantization techniques, focusing on a convolutional neural network trainedon the CIFAR-10 dataset. The increasing complexity of AI models, while improving
performance across various tasks, has also led to greater computational demands, particularly challenging in resource-constrained environments like mobile devices and edge
computing platforms. This research evaluates how the model size and computational
requirements can be reduced, while only minimizing the loss in accuracy, by applying
model pruning and quantization optimization techniques.
Two pruning methods, constant sparsity and polynomial decay, were evaluated at
different sparsity levels (0.5 and 0.8). The polynomial decay method, particularly at
a sparsity of 0.8, proved effective in reducing model size by up to two-thirds while
maintaining accuracy close to the baseline model. Great improvements in model efficiency, while maintaining accuracy, have been achieved with Post-Training Quantization
(PTQ), as well as Quantization-Aware Training (QAT). Combining QAT with PTQ led
to a nearly 50% reduction in inference time and a model size of less than one tenth of
the original model, with minimal impact on accuracy.
The results of applying these optimization techniques shows that there are possibilities for deploying efficient AI models on to resource-limited devices. This research
contributes to ongoing efforts in AI model optimization, providing valuable insights
into balancing performance with efficiency. Future work will explore the scalability of
these techniques to more complex models and real-world deployment scenarios, aiming
to further enhance the practicality and accessibility of optimized AI technologies.
Date of Award | 2024 |
---|---|
Original language | English (American) |
Supervisor | Josef Langer (Supervisor) |