Improving Forensic Audio Analysis: An Automated Transcription and Topic Modeling System

  • Vanessa Gloria Malli

    Student thesis: Master's Thesis

    Abstract

    Digital forensics is increasingly challenged by the vast and growing volumes of digital evidence, particularly in the form of audio recordings. Manual review of such data is time-consuming, error-prone, and hampers the efficiency of investigations. This thesis addresses these challenges by developing an automated system that combines audio transcription with thematic analysis using topic modeling. The goal is to assist investigators in efficiently identifying relevant information within large audio datasets. The proposed system follows a modular architecture and leverages state-of-the-art technologies such as Wav2Vec 2.0 for automatic speech recognition and BERTopic for semantic analysis of transcribed texts. By employing High-Performance Computing (HPC), the system meets the computational demands of processing large-scale audio data and enables scalable, real-time analysis. Legal and ethical considerations, such as data privacy and evidence integrity, have been incorporated to ensure the system’s applicability in forensic contexts. A fully functional prototype was implemented as a command-line tool, designed to integrate seamlessly into existing forensic workflows. Extensive benchmarking demonstrates that the system achieves high transcription accuracy even under challenging acoustic conditions and reliably extracts thematic structures. This significantly enhances the speed and quality of forensic investigations, allowing investigators to focus on the most critical information. The results highlight the potential of automated audio analysis for digital forensics. The developed system supports more efficient workflows and improved resource allocation. Future work may explore enhancements such as multilingual capabilities, integration of additional NLP modules, and improved usability for non-technical forensic practitioners.
    Date of Award2025
    Original languageEnglish
    SupervisorThomas Grurl (Supervisor)

    Studyprogram

    • Secure Information Systems

    Cite this

    '