Ana içeriğe atla
TR EN

MSc.Thesis Defense: Seçilay KUTAL, Automated Text Line Segmentation for Ottoman Manuscript Transcription


 

Automated Text Line Segmentation for Ottoman Manuscript Transcription

 

Seçilay KUTAL
Computer Science and Engineering, MSc. Thesis, 2025

 

Thesis Jury

Prof. Dr. Ayşe Berrin YANIKOĞLU (Thesis Advisor),

Prof. Erchan APTOULA,

Prof. Mine Elif KARSLIGİL

 

 

Date & Time: July 7th, 2025 –  11:00 AM

Place: FENS L067

Keywords : Ottoman, Manuscript, Text Line, Segmentation, Computer Vision.

 

Abstract

 

Text line segmentation is essential for the effective analysis of historical manuscripts. This step becomes challenging when the documents are handwritten, and even more with Ottoman manuscripts, which often contains complex writing styles and overlapping lines. In this thesis, various deep learning-based approaches based on U-Net and YOLO architectures for automatic text line segmentation in Ottoman manuscripts are developed. The U-Net-based approach relies on binary segmentation followed by connected component post-processing steps. The YOLO-based methods include a single-stage (instance segmentation) and a two-stage (combining oriented bounding boxes and segmentation) approaches. These models, initially trained on Arabic datasets, were tested on a 45-page Ottoman manuscript dataset containing both straight and angled text lines, and evaluated using various segmentation metrics. The YOLO approaches, which showed promising results, were further evaluated for their effect on OCR performance using an Ottoman dataset with 199 text lines. The YOLO OBB & Segmentation method achieved the highest performance, with a precision score of 99.3% on straight lines and 95% on angled lines at a 75% IoU threshold. Furthermore, this approach yielded 13.1% CER and 34.7% WER on the OCR evaluation dataset. The study also presents the Ottoman text line segmentation dataset produced using the highest performing model.