Skip to main content
TR EN

MSc.Thesis Defense:Ahmet Yasin Aytar

ENHANCING RETRIEVAL-AUGMENTED GENERATION FOR DATA SCIENCE: A COMPREHENSIVE FRAMEWORK FOR ACADEMIC LITERATURE NAVIGATION

 

 

Ahmet Yasin Aytar
Data Science, MSc. Thesis, 2024

 

Thesis Jury

Assoc. Prof. Kemal Kılıç (Thesis Advisor)

Assoc. Prof. Kamer Kaya (Thesis Co-Advisor)

Prof. Yücel Saygın

Asst. Prof. Murat Kaya

Prof. Pınar Karagöz

 

 

Date & Time: 11th of December, 2024 – 09:00 AM

Place: FENS 2019

Zoom Link: https://sabanciuniv.zoom.us/j/93332486164

Keywords : Retrieval-Augmented Generation (RAG), Data Science, Literature Retrieval, Academic Insights, Large Language Models (LLM)

 

Abstract

 

In the rapidly evolving field of data science, efficiently navigating the expansive body of academic literature is crucial for informed decision-making and innovation. This paper presents an enhanced Retrieval-Augmented Generation (RAG) application designed to assist data scientists in accessing precise and contextually relevant academic resources. The application integrates advanced techniques, including GeneRation Of BIbliographic Data (GROBID), fine-tuning embedding model, semantic chunking, and an abstract-first retrieval method, to significantly improve the relevance and accuracy of the retrieved information. A comprehensive evaluation using the Retrieval-Augmented Generation Assessment System (RAGAS) framework demonstrates substantial improvements in key metrics, particularly Context Relevance, underscoring the system’s effectiveness in reduc- ing information overload and enhancing decision-making processes. Our findings highlight the potential of this enhanced RAG system to transform academic exploration within data science, providing a valuable tool for researchers and practitioners alike.