Skip to main content
TR EN

MSc.Thesis Defense: Zeynep IŞIK, Zero- and Few-Shot Dark Kinase–Phosphosite Prediction via Task-Aware Protein Embeddings

Zero- and Few-Shot Dark Kinase–Phosphosite Prediction via Task-Aware Protein Embeddings

 

Zeynep Işık
Computer Science and Engineering, MSc. Thesis, 2025

 

Thesis Jury

Assoc. Prof. Öznur Taştan (Thesis Advisor), 

Asst. Prof. Nur Mustafaoğlu, Prof. Arzucan Özgür

 

 

Date & Time: 10th July, 2025 – 15:40 PM

Place: FENS L063

Keywords: Task Adaptation, Zero-shot Learning, Few-shot Learning, Transformers, Protein Language Models, Kinases, Phosphorylation

 

Abstract

 

Accurately mapping kinases to their substrate phosphosites is fundamental for decoding cellular signaling and understanding disease mechanisms. While high-throughput techniques can identify the phosphosites, finding the kinase that catalyzes the phosphorylation is challenging. Thus, over 95% of experimentally detected human phosphosites lack kinase annotations. It is possible to formulate the kinase-phosphosite association problem as a supervised multi-class classification task; however, a large portion of the human kinases are understudied (dark kinases) and have few or no phosphosites associated with them, thus dark kinases fall outside the reach of conventional supervised learning methods. In this thesis, we formulate kinase–phosphosite association as zero-shot and few-shot learning tasks: in the zero-shot setting, the model must predict associations for kinases never seen during training; in the few-shot setting, it may leverage only a handful of labeled examples.

We employ transformer-based protein language models (pLMs) to embed both kinase domains and phosphosite peptides, and we systematically explore domain-adaptation strategies—ranging from full fine-tuning and partial layer re-initialization to task-specific pre-training—under severe data constraints. Surprisingly, a de novo–trained ESM-1b model outperforms its fully fine-tuned pretrained counterpart, suggesting that general purpose pLM embeddings may lack task-specific biochemical context. Our best results are obtained by combining kinase- and phosphosite-aware pLMs with partial re-initialization of upper transformer layers. On the DARKIN benchmark, this approach delivers state-of-the-art performance in both zero-shot and few-shot kinase prediction, offering a promising direction for illuminating the dark phosphoproteome.