MSc.Thesis Defense:Emine Beyza Çandır Soydemir

UNVEILING THE TRUE IMPACT OF DRUG AND CELL LINE REPRESENTATIONS IN DRUG SYNERGY PREDICTION

Emine Beyza Çandır Soydemir
Computer Science and Engineering, MSc. Thesis, 2024

Thesis Jury

Assoc. Prof. Öznur Taştan (Thesis Advisor), Asst. Prof. Onur Varol,

Assoc. Prof. Abdullah Ercüment Çiçek

Date & Time: December 18th, 2024 – 10:30 AM

Place: FENS L030

Keywords : Drug Synergy, Deep Learning, Generalization, One-Hot-Enoding

Abstract

Drug combination therapy holds promise as an effective strategy for treating complex diseases such as cancer. However, due to the vast combinatorial space of drug combinations, experimental screening of all of them is not feasible. Computational models have been developed to prioritize drug pairs that could work synergistically to accelerate experimental screening efforts. These models are trained on large datasets of previously reported drug combination measurements and use rich representations of drugs and cell lines that encode chemical, structural, and biological properties.

In this thesis, we first aimed to improve upon our previous synergy predator, Matchmaker, by incorporating richer biological information such as pathways and mechanism of action or alternative drug representations. Despite all our efforts, none of the models could perform better. Motivated by these findings, we tested a more straightforward approach by replacing detailed feature representations with one-hot encodings of drugs and cell lines. Surprisingly, these models stripped of chemical and biological information can come very close to the results trained with rich biological and chemical information.

Here, in this thesis, we systematically experimented with published synergy prediction models by replacing drug representations and cell line features with a simple one-hot encoding of drugs and cell lines in various evaluation settings. Regardless of the drug input feature or the architecture, we observe that the simple one-hot encoding baseline performs similarly in all models. This unexpected result suggests that the representations serve as simple identifiers and models that capture general co-variation patterns of synergy measurements rather than learning chemical or biological information. This could be why the models do not generalize well to new drugs and cell lines. While synergy prediction models are still beneficial in deciding on what pairs to test within a panel of drugs and cell lines, these results demonstrate that alternative approaches are needed for developing synergy prediction models that could work across new drugs, cell lines, and patients.