A Whitebox Approach - Foundational Principles of Activation-Based Tracing in Diverse Deep LearningApplications

  • Lukas Schmalzer

    Student thesis: Master's Thesis

    Abstract

    This thesis introduces Activation-Based Tracing (ABT), a novel methodology for attributing neural network predictions to specific training samples by comparing activation patterns in high-dimensional spaces. Unlike traditional explainable AI techniques that focus on input feature attribution, ABT directly links internal representations from inference-time activations to those generated during training. This approach reveals the why behind model behavior by tracing the path from learned parameters back to their originating data, bridging the gap between local interpretability and global training history. Empirical evaluations across diverse architectures - from simple classifiers to complex generative models like Generative Adversarial Networks and Denoising Diffusion Models - demonstrate that ABT consistently provides balanced and interpretable attribution. Through systematic influence validation experiments, this work confirms that activation similarity serves as a reliable proxy for training data influence. While current limitations include computational scalability and the need for architecture-specific adaptations, this thesis establishes the foundational principles of a powerful attribution method. By making the origins of AI-generated content and decisions traceable, ABT lays the groundwork for more robust data attribution systems, offering critical advances in AI transparency, intellectual property tracking, and ethical AI deployment.
    Date of Award2025
    Original languageEnglish
    SupervisorDavid Christian Schedl (Supervisor)

    Studyprogram

    • Interactive Media

    Cite this

    '