This thesis introduces Activation-Based Tracing (ABT), a novel methodology for attributing neural network predictions to specific training samples by comparing activation patterns in high-dimensional spaces. Unlike traditional explainable AI techniques that focus on input feature attribution, ABT directly links internal representations from inference-time activations to those generated during training. This approach reveals the why behind model behavior by tracing the path from learned parameters back to their originating data, bridging the gap between local interpretability and global training history. Empirical evaluations across diverse architectures - from simple classifiers to complex generative models like Generative Adversarial Networks and Denoising Diffusion Models - demonstrate that ABT consistently provides balanced and interpretable attribution. Through systematic influence validation experiments, this work confirms that activation similarity serves as a reliable proxy for training data influence. While current limitations include computational scalability and the need for architecture-specific adaptations, this thesis establishes the foundational principles of a powerful attribution method. By making the origins of AI-generated content and decisions traceable, ABT lays the groundwork for more robust data attribution systems, offering critical advances in AI transparency, intellectual property tracking, and ethical AI deployment.
A Whitebox Approach - Foundational Principles of Activation-Based Tracing in Diverse Deep LearningApplications
Schmalzer, L. (Author). 2025
Student thesis: Master's Thesis