MSc.Thesis Defense:Ahmet Yasin Aytar
ENHANCING RETRIEVAL-AUGMENTED GENERATION FOR DATA SCIENCE: A COMPREHENSIVE FRAMEWORK FOR ACADEMIC LITERATURE NAVIGATION
Ahmet Yasin Aytar
Data Science, MSc. Thesis, 2024
Thesis Jury
Assoc. Prof. Kemal Kılıç (Thesis Advisor)
Assoc. Prof. Kamer Kaya (Thesis Co-Advisor)
Prof. Yücel Saygın
Asst. Prof. Murat Kaya
Prof. Pınar Karagöz
Date & Time: 11th of December, 2024 – 09:00 AM
Place: FENS 2019
Zoom Link: https://sabanciuniv.zoom.us/j/93332486164
Keywords : Retrieval-Augmented Generation (RAG), Data Science, Literature Retrieval, Academic Insights, Large Language Models (LLM)
Abstract
In the rapidly evolving field of data science, efficiently navigating the expansive body of academic literature is crucial for informed decision-making and innovation. This paper presents an enhanced Retrieval-Augmented Generation (RAG) application designed to assist data scientists in accessing precise and contextually relevant academic resources. The application integrates advanced techniques, including GeneRation Of BIbliographic Data (GROBID), fine-tuning embedding model, semantic chunking, and an abstract-first retrieval method, to significantly improve the relevance and accuracy of the retrieved information. A comprehensive evaluation using the Retrieval-Augmented Generation Assessment System (RAGAS) framework demonstrates substantial improvements in key metrics, particularly Context Relevance, underscoring the system’s effectiveness in reduc- ing information overload and enhancing decision-making processes. Our findings highlight the potential of this enhanced RAG system to transform academic exploration within data science, providing a valuable tool for researchers and practitioners alike.