Sneaky emotions: impact of data partitions in affective computing experiments with brain-computer interfacing.
Yoelvis Moreno-AlcaydeV Javier TraverLuis A LeivaPublished in: Biomedical engineering letters (2023)
Brain-Computer Interfacing (BCI) has shown promise in Machine Learning (ML) for emotion recognition. Unfortunately, how data are partitioned in training/test splits is often overlooked, which makes it difficult to attribute research findings to actual modeling improvements or to partitioning issues. We introduce the "data transfer rate" construct (i.e., how much data of the test samples are seen during training) and use it to examine data partitioning effects under several conditions. As a use case, we consider emotion recognition in videos using electroencephalogram (EEG) signals. Three data splits are considered, each representing a relevant BCI task: subject-independent (affective decoding), video-independent (affective annotation), and time-based (feature extraction). Model performance may change significantly (ranging e.g. from 50% to 90%) depending on how data is partitioned, in classification accuracy. This was evidenced in all experimental conditions tested. Our results show that (1) for affective decoding, it is hard to achieve performance above the baseline case (random classification) unless some data of the test subjects are considered in the training partition; (2) for affective annotation, having data from the same subject in training and test partitions, even though they correspond to different videos, also increases performance; and (3) later signal segments are generally more discriminative, but it is the number of segments (data points) what matters the most. Our findings not only have implications in how brain data are managed, but also in how experimental conditions and results are reported.