Offline Retrieval Augmented Generation

  • Kevin Händel

    Student thesis: Master's Thesis

    Abstract

    Offline Retrieval-Augmented Generation (RAG) systems combine domain-specific knowledge with large language models to generate accurate and contextually relevant responses while ensuring data privacy and regulatory compliance. This thesis explores the development and assessment of an offline RAG system that operates without internet connectivity, utilizing on-premises infrastructure. The system employs pre-trained language and embedding models to represent internal data as vectors, facilitating efficient retrieval from diverse file types, including text, PDFs, and audio. The architecture integrates embedding models to convert internal data into vector representations stored in a local vector database. When a user query is made, the system retrieves relevant data and augments the language model’s input with this context, enhancing the response quality with up-to-date, specific information. Evaluation of the offline RAG system against a commercial cloud-based solution focused on retrieval accuracy, contextual relevance, and overall response quality. Results indicated that while the offline system shows promise, further optimization is necessary to match the performance of cloud-based counterparts. Key areas for improvement include enhancing embedding quality, refining chunking strategies, and integrating domain-specific knowledge graphs. Future work involves exploring alternative retrieval mechanisms, hybrid retrieval strategies, and reinforcement learning from human feedback to improve response generation. Additionally, leveraging hardware acceleration and distributed systems can enhance scalability and performance, making the offline RAG system a viable solution for secure, enterprise-scale deployments.
    Date of Award2024
    Original languageEnglish (American)
    SupervisorChristoph Schaffer (Supervisor)

    Cite this

    '