From Phonemes to Fluency Using Self Supervised Learning to Track Children’s Reading

  • Philipp Ollmann

    Student thesis: Master's Thesis

    Abstract

    Reading proficiency in early childhood is crucial for academic success and intellectual development. However, more and more children are struggling with reading. According to the last PISA study in Austria, one out of five children is dealing with reading difficulties. The reasons for this are diverse. A mobile app that tracks children while reading aloud and guides them when they experience difficulties could offer meaningful help. Therefore, this thesis explores a prototyping approach for a core component that tracks children’s reading using a selfsupervised Wav2Vec2 model with a limited amount of data. Self-supervised learning allows models to learn general representations from large amounts of unlabeled audio, which can then be fine-tuned on smaller, task-specific datasets, making it especially useful when labeled data is limited. The developed model is operating on the phonetic level with the help of the International Phonetic Alphabet (IPA). To implement this, the kidsTALC dataset from the Leibniz University Hannover was used. It contains spontaneous speech recordings of German speaking children. To enhance the training data and improve robustness, several data augmentation techniques were applied and evaluated, including pitch shifting, formant shifting and speed variation. The models were trained using different data configurations to compare the effects of data variety and quality on recognition performance. The best model trained in this work achieved a phoneme error rate (PER) of 14.3% and a word error rate (WER) of 31.6% on unseen child speech data, demonstrating the potential of self-supervised models for such use cases. An attempt was also made to deploy the model on a mobile device to test real-time feasibility. However, this was not successful due to the current model size, highlighting a key area for future optimization. This thesis aims to test the planned approach for a mobile reading support system that can recognize reading difficulties at the phoneme level and provide personalized feedback for young readers.
    Date of Award2025
    Original languageEnglish
    SupervisorErik Sonnleitner (Supervisor)

    Studyprogram

    • Mobile Computing

    Cite this

    '