Text and image generation from intracranial electroencephalography using an embedding space for text and images.

Yuya Ikegawa Ryohei FukumaHidenori SuganoSatoru OshinoNaoki TaniKentaro TamuraYasushi IimuraHiroharu SuzukiShota YamamotoYuya FujitaShinji NishimotoHaruhiko KishimaTakufumi Yanagisawa

Published in: Journal of neural engineering (2024)

Objective. Invasive brain-computer interfaces (BCIs) are promising communication devices for severely paralyzed patients. Recent advances in intracranial electroencephalography (iEEG) coupled with natural language processing have enhanced communication speed and accuracy. It should be noted that such a speech BCI uses signals from the motor cortex. However, BCIs based on motor cortical activities may experience signal deterioration in users with motor cortical degenerative diseases such as amyotrophic lateral sclerosis. An alternative approach to using iEEG of the motor cortex is necessary to support patients with such conditions. Approach . In this study, a multimodal embedding of text and images was used to decode visual semantic information from iEEG signals of the visual cortex to generate text and images. We used contrastive language-image pretraining (CLIP) embedding to represent images presented to 17 patients implanted with electrodes in the occipital and temporal cortices. A CLIP image vector was inferred from the high- γ power of the iEEG signals recorded while viewing the images. Main results. Text was generated by CLIPCAP from the inferred CLIP vector with better-than-chance accuracy. Then, an image was created from the generated text using StableDiffusion with significant accuracy. Significance. The text and images generated from iEEG through the CLIP embedding vector can be used for improved communication.

Keyphrases