Abstract
This master’s thesis examines the application of the reinforcement learning algorithm SARSA (State-Action-Reward-State-Action) to solve dynamic stackingproblems. These problems are characterized by the goal of efficiently stacking or
rearranging objects while accounting for uncertainties and time-dependent factors.
The focus lies on modeling the problem as a Markov Decision Process (MDP)
and the iterative optimization of strategies using SARSA. An existing simulation
environment was utilized and extended with specific reward functions to analyze
various scenarios. Particular attention was given to testing different approaches
to policy settings, varying reward weights, and the effects of different hyperparameter configurations. The results demonstrate that SARSA, with its on-policy
properties and stable convergence, is a promising approach. Furthermore, the influence of these parameters on the efficiency and stability of the learning process
was examined. The findings provide valuable insights for applying reinforcement
learning to dynamic and industrial systems.
I would like to extend my heartfelt thanks to the Josef Ressel Center for Adaptive Optimization in Dynamic Environments for providing this exciting and challenging topic. Working on a practical and scientifically demanding problem has
significantly deepened my understanding and enhanced my skills.
Date of Award | 2024 |
---|---|
Original language | German (Austria) |
Supervisor | Stefan Wagner (Supervisor) |