Abstract
This thesis shows the process to create voice deepfakes in a German dialect. Deepfakes entail faking biometrics properties with artificial intelligence. Some deepfakes of aselected German dialect were create, to imitate a target, to show this process.
A main step to create deepfakes in a selected German dialect was to create raw data
of this dialect. To achieve this, labels were created. These labels were voiced by voice
actors speaking this dialect. One of those voice actors was chosen to be the target. As
it helps having a dataset in a selected dialect of a different speaker than the target, this
process of creating a voice dataset for a German dialect was explained.
Text-to-speech-systems (TTS-Systems) were used to create the deepfakes. Multiple
models were trained to imitate the dialect and to imitate the target. These models were
trained with different approaches. Some deepfakes were created with these models to
evaluate, how to create models to imitate a dialect and a target efficiently. It was also
evaluated, if datasets and components from a standard language can be used to create
deepfakes in a German dialect. This lead to components and datasets from a standard
language being used to help create these models. Some components have been specially
adapted.
Date of Award | 2024 |
---|---|
Original language | German (Austria) |
Supervisor | Harald Lampesberger (Supervisor) |