Neural correlates of face perception modeled with a convolutional recurrent neural network.
Jamie A O'ReillyJordan J WehrmanAaron CareyJennifer BedwinThomas HournFawad AsadiPaul F SowmanPublished in: Journal of neural engineering (2023)
Event-related potential (ERP) sensitivity to faces is predominantly characterized by an N170 peak that has greater amplitude and shorter latency when elicited by human faces than images of other objects. We aimed to develop a computational model of visual ERP generation to study this phenomenon which consisted of a three-dimensional convolutional neural network (CNN) connected to a recurrent neural network (RNN).
Approach: The CNN provided image representation learning, complimenting sequence learning of the RNN for modeling visually-evoked potentials. We used open-access data from ERP CORE (40 subjects) to develop the model, generated synthetic images for simulating experiments with a generative adversarial network, then collected additional data (16 subjects) to validate predictions of these simulations. For modeling, visual stimuli presented during ERP experiments were represented as sequences of images (time x pixels). These were provided as inputs to the model. By filtering and pooling over spatial dimensions, the CNN transformed these inputs into sequences of vectors that were passed to the RNN. The ERP waveforms evoked by visual stimuli were provided to the RNN as labels for supervised learning. The whole model was trained end-to-end using data from the open-access dataset to reproduce ERP waveforms evoked by visual events. 
Main results: Cross-validation model outputs strongly correlated with open-access (r = 0.98) and validation study data (r = 0.78). Open-access and validation study data correlated similarly (r = 0.81). Some aspects of model behavior were consistent with neural recordings while others were not, suggesting promising albeit limited capacity for modeling the neurophysiology of face-sensitive ERP generation.
Significance: The approach developed in this work is potentially of significant value for visual neuroscience research, where it may be adapted for multiple contexts to study computational relationships between visual stimuli and evoked neural activity.
.