In the realm of data analytics, log data serves as a crucial source of information for monitoring, diagnosing, and understanding the behavior of complex systems. However, the sheer volume and unstructured nature of log data pose significant challenges for efficient retrieval and analysis. This thesis explores the application of vector search techniques for log data, leveraging the power of embedding models to represent log entries in a semantic space. The thesis begins with a brief introduction to log analytics and a comprehensive exploration of vector search theory, elucidating various techniques for generating embeddings and measuring vector distances. By examining the strengths and weaknesses of different embedding models, readers gain insights into the nuances of representation learning and similarity computation. Furthermore, the thesis presents the development of a prototypical software system tailored for semantic log data analysis. Its main purpose is to query log data stored in log files based on a given search string. Leveraging advanced semantic search techniques, the application facilitates intuitive querying and retrieval of relevant log entries based on their contextual meaning rather than simple keyword matching. This prototype is meant to be implemented and should embrace an architecture which enables a straightforward integration into any log analytics tool. It is complemented by a comprehensive evaluation of its performance, highlighting the benefits and limitations of vector search techniques for log data analysis. By conducting experiments on real-world log data, this thesis provides empirical evidence of the effectiveness of embedding models in enhancing log search capabilities. Through a combination of theoretical analysis, empirical evaluation, and practical implementation, this thesis aims to provide a comprehensive overview of vector search techniques for log data. By bridging the gap between theoretical research and realworld application, this work proposes techniques for more effective and efficient log data management and analytics systems.
On-Demand Vector Search on Log Data at Scale
Steiner, F. J. (Author). 2025
Student thesis: Master's Thesis