MSc.Thesis Defense:Can Aksoy | Sabancı Üniversitesi

A BRAIN-COMPUTER INTERFACE FOR OBJECT SELECTION AND TRACKING IN VIDEOS

CAN AKSOY
Electronics Engineering MSc. Thesis, 2024

Thesis Jury

Assoc. Prof. HÜSEYİN ÖZKAN (Thesis Supervisor)

Asst. Prof. NİHAN ALP (Thesis Co-advisor)

Prof.. İBRAHİM TEKİN

Assoc. Prof. ERCHAN APTOULA

Asst. Prof. TUNA ÇAKAR

Date & Time: December 20th, 2024 – 10 AM

Place: FENS L045

Keywords: Brain-Computer Interface, BCI, Steady State Visually Evoked

Potential, SSVEP, Electroencephalogram, EEG, Unmanned Aerial Vehicles, UAV,

Canonical Correlation Analysis, CCA

Abstract

Brain-computer interfaces (BCIs) are crucial technologies that enable the generation of computer commands solely from brain signals. These systems have applications in various fields, including robotic control and neuromarketing. An effective stimulation method in BCIs is based on steady-state visual evoked potentials (SSVEPs), which is measured non-invasively using electroencephalography (EEG). SSVEP-based BCIs offer a high signal-to-noise ratio (SNR), making them advantageous for real-time applications. Potential users of such BCIs include operators of defense UAVs (Unmanned Aerial Vehicles) tasked with surveillance, as well as individuals interacting with multimedia systems for entertainment. Hence, two particularly important BCI functionalities are object selection and tracking in videos.

In this scope, the thesis presents a comprehensive analysis of object selection and tracking in videos using EEG and eye tracking data, leading to a fusion-based approach. We utilize two experimental SSVEP-based BCI setups for data collection: an invisible grid and a visible grid. In both setups, the computer screen is divided into a rectangular grid, and each grid region is assigned to a distinct frequency. In the invisible grid setup, the spatial division into the rectangular grid is not visible to the user/participant, and each moving object in the video flickers for SSVEP stimulation at the frequency of the region that the object is in. In contrast, the visible grid setup shows to the user/participant the spatial divisions on the screen explicitly, and not the objects but the grid regions themselves flicker at the corresponding distinct frequencies, facilitating the object tracking application. Using these two setups, we conducted EEG experiments with 12 participants in both visible and invisible grid setups. Each experiment consisted of 16 videos that included 3, 4, or 5 moving objects (human or vehicle), and each video was repeated 16 times, resulting in a total of 256 trials per participant. In each trial, participants were instructed to focus on the indicated object and the EEG signals were collected. The videos were sourced from civilian and defense contexts, offering various scenarios for object selection and tracking analysis.

For object selection, a trial is assumed to be successful if EEG signal processing (i.e. SSVEP decoding) correctly predicts the object the participant picked. Participants in the invisible grid setup achieved selection accuracies of 48% without averaging EEG data and 79% with averaging EEG data across the repetitions of each video, significantly exceeding the average chance level which is around 27%. Canonical Correlation Analysis (CCA) was utilized in the analysis to decode SSVEP signals effectively. In the visible grid setup, accuracies reached 51% without averaging and 82% with averaging, consistently surpassing the chance level. Analyses based on individual videos highlighted the advantages of averaging, especially in challenging scenarios with overlapping or ambiguous objects.

The focus then shifts to evaluating the performance of object tracking using SSVEP signals in the visible grid setup, leveraging CCA and a windowing approach for signal processing. For object tracking, a trial within a window is considered successful if the SSVEP decoding correctly predicts the object in that window. Although standalone eye tracking offers high spatial precision, its accuracy, is limited by calibration drift and gaze tracking loss, with average accuracies of 43.75% for a 1.25-second window and 35.67% for a 2.5-second window. To address these limitations, a fusion-based methodology combining EEG and eye-tracking data was developed, allowing the system to prioritize eye-tracking data when properly calibrated while using EEG data as a fallback in calibration issues or data inconsistencies in eye-tracking. This fusion method led to an increase in overall accuracy, rising to 51.04% for a window size of 1.25 seconds and 48.96% for a window size of 2.5 seconds, demonstrating the added value of incorporating EEG signals. Furthermore, the Root Mean Square Error (RMSE), calculated as the square root of the mean squared difference between the actual and predicted grid positions of the target object, serves as a complementary metric for tracking performance by quantifying prediction errors. A lower RMSE indicates better alignment between predictions and actual positions. The fusion method notably reduced RMSE compared to standalone modalities, decreasing values from 1.99 and 2.27 (eye tracker only) to 1.09 and 1.15 for 1.25 and 2.5-second windows, respectively. This indicates that the fusion method effectively compensates for the shortcomings of individual modalities. Overall, these findings demonstrate that SSVEP-based BCIs are highly effective for object selection and tracking, providing robust performance in complex scenarios. This thesis highlights the potential of these systems to improve real-time decision making and interactions in dynamic environments. This advancement paves the way for future applications in various fields, ranging from defense to multimedia technologies.