MSc Thesis Defense: Alize Sevgi Yalçınkaya, Post-OCR Text Correction for Ottoman Turkish Transcription, Date & Time: May 20th, 2026 - 1:00 PM, Place: FENS L035

Post-OCR Text Correction for Ottoman Turkish Transcription

Alize Sevgi Yalçınkaya
Computer Science and Engineering, MSc Thesis, 2026

Thesis Jury

Prof. Berrin Yanıkoğlu (Thesis Advisor)

Asst. Prof. Dilara Keküllüoğlu (Thesis Co-Advisor)

Prof. Selim Balcısoy

Assoc. Prof. Öznur Taştan

Asst. Prof. Esma Bilgin Taşdemir

Date & Time: May 20th, 2026 – 1:00 PM

Place: FENS L035

Keywords : ocr post-correction, ottoman turkish, confidence calibration, error detection, low-resource nlp

Abstract

Historical document digitization requires effective post-OCR error correction, particularly for low-resource scripts like Ottoman Turkish that lack modern native speakers and linguistic tools. We develop and evaluate fully automatic post-processing methods for Ottoman texts digitized via a frozen TrOCR-based recognizer AKİS. Our corpus comprises 102,000 tokens across four books spanning 1732–1920, with token error rates of 17.3%–34.8%. We evaluate a single-stage ByT5 byte-level encoder-decoder for direct correction, and three error detection methods (calibrated confidence thresholding, BERT-based classification, and confidence-fused BERT) to study whether detection signals can guide correction. Calibrated confidence thresholding achieves token-level F1 = 0.577, BERT-based classification reaches F1 = 0.622, and confidence-fused BERT reaches F1 = 0.640. All three methods produce well-calibrated probabilities (ECE in the 0.035–0.075 range). Both ByT5 variants achieve approximately 9.6% CER on the test set, an 8.07–8.55% relative reduction from the 10.51% OCR baseline. An error-type decomposition of the 3,092 test-set errors shows that the model copies 85% of errors unchanged and corrects circumflex substitutions at 42%, the sole subtype where targeted application yields clear net benefit. Confidence-weighted training does not improve aggregate CER but reduces overcorrections by 19%. A quality-estimation experiment yields a negative result: detection scores cannot predict whether applying a correction will help or harm a given sequence. These findings show that OCR confidence is most useful as a training-time risk-control signal rather than as an inference-time selector.