MSc Thesis Defense: Feyza Teker, MULTI-DATASET LABEL SPACE UNIFICATION FOR OFF-ROAD AUTONOMOUS DRIVING USING LARGE LANGUAGE MODELS , Date & Time: 30 June, 2026 – 9:00 AM, Place: FENS L029

MULTI-DATASET LABEL SPACE UNIFICATION FOR OFF-ROAD

AUTONOMOUS DRIVING USING LARGE LANGUAGE MODELS

Feyza Teker
Mechatronics Engineering, MSc Thesis, 2026

Thesis Jury

Prof. Mustafa Ünel (Thesis Advisor)

Assoc. Prof. Kemaletttin Erbatur

Assoc. Prof. Ali Fuat Ergenç

Date & Time: 30th June, 2026 – 9.00 AM

Place: FENS L029

Keywords : off-road autonomous driving, semantic segmentation, multi-dataset

training, large language models, ontology mapping

Abstract

Training semantic segmentation models for off-road autonomous driving is difficult because the available datasets are individually small, geographically narrow, and annotated under inconsistent label conventions. This thesis proposes a multi-dataset training framework that unifies several off-road and urban datasets under a single label space through automated, LLM-based ontology construction and knowledge distillation. The unified taxonomy and dataset-specific mapping functions are generated by Gemini 2.5 Pro rather than built by hand. Variability in the model output is controlled through repeated queries, consensus filtering, and reverse cross validation under explicit error criteria. Each master label is annotated with a discrete traversability tier, embedding navigation-relevant information directly into the label space. A teacher model fine-tuned on GOOSE generates pseudo-labels for five off-road and one urban auxiliary dataset through test-time augmentation, constrained by ground-truth annotations through the ontology mapping and filtered by per-pixel confidence threshold. The student is trained with tempered dataset sampling to compensate for size imbalance. Mask2Former and OneFormer are evaluated as both teacher and student. On the GOOSE validation set, the best teacher reaches 0.673 mIoU and the best student reaches 0.704 mIoU, surpassing the published GOOSE baseline and showing that LLM-based automated ontology construction can match and exceed manual quality. The framework offers a scalable path towards multi-dataset training in the off-road domain.