General optimal power flow (OPF) is an important problem in the operation of electric power grids. Solution methods to the OPF have been studied extensively that mainly solve steady-state situations, ignoring uncertainties of state variables as well as their near-future. Thus, in a dynamic and uncertain power system, where the demand as well as the supply-side show volatile behavior, optimization methods are needed that provide solutions very quickly, eliminating issues on convergence speed or robustness of the optimization. This paper introduces a policy-based approach where optimal control policies are learned offline for a given power grid based on evolutionary computation, that later provide quick and accurate control actions in volatile situations. With such an approach, it's no more necessary to solve the OPF in each new situation by applying a certain optimization procedure, but the policies provide (near-) optimal actions very quickly, satisfying all constraints in a reliable and robust way. Thus, a method is available for flexible and optimized power grid operation over time. This will be essential for meeting the claims for the future of smart grids.