Analysing internal neuron activation patterns in Large Language Models is a promising way to understand their complex behaviour. However, the high dimensionality and immense volume of this data render common analytical techniques (e.g., the k-nearest neighbor search) computationally prohibitive due to the curse of dimensionality. This thesis focuses on the k-nearest neighbor search and addresses the critical technical challenge of compressing high-dimensional activation data to enable efficient, reproducible, and accurate similarity searches. Further motivated and guided by the Retraceable Explainable Neural Network paradigm, this work develops and evaluates two distinct, customisable compression approaches. The first approach is a (near-)lossless pipeline that combines sparsity exploitation and min-max normalisation for quantisation with efficient columnar storage formats to maximise data fidelity. The second approach is a lossy fingerprinting method that uses Principal Component Analysis and a suite of statistical metrics to create a low-dimensional proxy for each activation vector, prioritising maximal compression and search speed. The performance of these methods was benchmarked using the well-established MNIST dataset before the kNN-search was evaluated on activations extracted from a custom GPT-2 and an open-source, state-of-the-art Open Language Model. The performance of the different techniques was quantified by measuring the trade-off between storage efficiency, query time and preservation of the true k-nearest neighbor structure, as assessed by metrics such as Normalised Discounted Cumulative Gain and neighbor rank preservation. The experimental results demonstrate a clear and quantifiable spectrum of performance, ranging from near-perfect k-nearest neighbor accuracy with moderate compression, to extreme data reduction at the expense of some neighborhood fidelity. These findings provide a practical framework and an empirical benchmark for the scalable, activation-based analysis of neural networks in general. They establish efficient compression as a prerequisite for future research into model interpretability, debugging and data provenance.
A White-Box Approach to Scalable AI-ExplainabilityHigh-Fidelity kNN-Analysis via Compressed LLM-Activations
Geiger, L. (Author). 2025
Student thesis: Master's Thesis