Abstract
This thesis explores active learning techniques and their potential to reduce the environmental impact associated with training AI models. First we identify a significant gap inawareness regarding sustainable AI practices among developers via a user study. This
emphasizes the need for a deeper understanding of the economic and environmental
impacts associated with training machine learning models. While AI literacy has been
explored in various contexts, there is a notable lack of focus on AI sustainability literacy.
Future research should thus focus on developing sustainable AI to provide more comprehensive frameworks for assessing and reducing environmental impacts. In this thesis,
the baseline model training pipeline serves as a foundational benchmark by utilizing
the entire pool of 8000 images without applying any sampling techniques. In contrast,
the active learning approaches examined employ methods such as random sampling,
which selects data points randomly, and least confidence sampling, which targets the
least certain predictions. Additionally, density-weighted sampling prioritizes representative data points based on a combination of feature density and uncertainty, while
diversity sampling ensures broad visual diversity by selecting images from all regions
of the feature space. These methods aim to reduce the number of images used by half,
resulting in a total of 4000 images for model training. Furthermore, the subsets created
as part of this thesis offer valuable resources for comparing object detection models
across two common wildlife conservation tasks. The results show that not all active
learning approaches are equally suitable for both scenarios. While diversity sampling
and uncertainty sampling are particularly effective for more diverse datasets, other approaches like random sampling and density-weighted sampling perform relatively better
with datasets that have higher feature similarity compared to their performance on
more diverse datasets. Notably, there is a trade-off between performance and efficiency:
diversity sampling achieves high performance but is less efficient compared to random
sampling, which is more resource-efficient but less performant. This highlights the importance of selecting the appropriate sampling method based on the balance required
between performance and resource usage. This thesis’ results also show, that in terms of
sustainability and efficiency, the baseline model demonstrates the highest total energy
consumption and CO2 emissions, whereas random sampling is the most energy-efficient
and environmentally friendly. The baseline model, however, achieves faster and more
reliable convergence compared to models using various sampling methods. These findings underscore the need to carefully select sampling strategies to balance performance
with resource utilization and emphasize the importance of optimizing these strategies
for sustainable and efficient model training.
Date of Award | 2024 |
---|---|
Original language | English (American) |
Supervisor | David Christian Schedl (Supervisor) |