Posted: 2022/12/29 10:03 | Author: NICA

The first author of the article is Du Changde, title:“Review of visual neural encoding and decoding methods in fMRI”

Abstract :

The relationship between human visual experience and evoked neural activity is central to the field of computational neuroscience. The purpose of visual neural encoding and decoding is to study the relationship between visual stimuli and the evoked neural activity by using neuroimaging data such as functional magnetic resonance imaging (fMRI). Neural encoding researches attempt to predict the brain activity according to the presented external stimuli, which contributes to the development of brain science and brain-like artificial intelligence. Neural decoding researches attempt to predict the information about external stimuli by analyzing the observed brain activities, which can interpret the state of human visual perception and promote the development of brain computer interface (BCI). Therefore, fMRI based visual neural encoding and decoding researches have important scientific significance and engineering value. Typically, the encoding models are based on the specific computations that are thought to underlie the observed brain responses for specific visual stimuli. Early studies of visual neural encoding relied heavily on Gabor wavelet features because these features are very good at modeling brain responses in the primary visual cortex. Recently, given the success of deep neural networks (DNNs) in classifying objects in natural images, the representations within these networks have been used to build encoding models of cortical responses to complex visual stimuli. Most of the existing decoding studies are based on multi-voxel pattern analysis (MVPA) method, but brain connectivity pattern is also a key feature of the brain state and can be used for brain decoding. Although recent studies have demonstrated the feasibility of decoding the identity of binary contrast patterns, handwritten characters, human facial images, natural picture/video stimuli and dreams from the corresponding brain activation patterns, the accurate reconstruction of the visual stimuli from fMRI still lacks adequate examination and requires plenty of efforts to improve. On the basis of summarizing the key technologies and research progress of fMRI based visual neural encoding and decoding, this paper further analyzes the limitations of existing visual neural encoding and decoding methods. In terms of visual neural encoding, the development process of population receptive field (pRF) estimation method is introduced in detail. In terms of visual neural decoding, it is divided into semantic classification, image identification and image reconstruction according to task types, and the representative research work of each part and the methods used are described in detail. From the perspective of machine learning, semantic classification is a single label or multi-label classification problem. Simple visual stimuli only contain a single object, while natural visual stimuli often contain multiple semantic labels. For example, an image may contain flowers, water, trees, cars, etc. Predicting one or more semantic labels of the visual stimulus from the brain signal is called semantic decoding. Image retrieval based on brain signal is also a common visual decoding task where the model is created to "decode" neural activity by retrieving a picture of what a person has just seen or imagined. In particular, the reconstruction techniques of simple image, face image and complex natural image based on deep generative models (including variational auto-encoders (VAEs) and generative adversarial networks (GANs)) are introduced in the part of image reconstruction. Secondly, 10 open source datasets commonly used in this field were statistically sorted out, and the sample size, number of subjects, types of stimuli, research purposes and download links of the datasets were summarized in detail. These datasets have made important contributions to the development of this field. Finally, we introduce the commonly used measurement metrics of visual neural encoding and decoding model in detail, analyze the shortcomings of current visual neural encoding and decoding methods, propose feasible suggestions for improvement, and show the future development directions. Specifically, for neural encoding, the existing methods still have the following shortcomings:1) the computational models are mostly based on the existing neural network architecture, which cannot reflect the real biological visual information flow; 2) due to the selective attention of each person in the visual perception and the inevitable noise in the fMRI data collection, individual differences are significant; 3) the sample size of the existing fMRI data set is insufficient; 4) most researchers construct feature spaces of neural encoding models based on fixed types of pre-trained neural networks (such as AlexNet), causing problems such as insufficient diversity of visual features. On the other hand, although the existing visual neural decoding methods perform well in the semantic classification and image identification tasks, it is still very difficult to establish an accurate mapping between visual stimuli and visual neural signals, and the results of image reconstruction are often blurry and lack of clear semantics. Moreover, most of the existing visual neural decoding methods are based on linear transformation or deep network transformation of visual images, lacking exploration of new visual features. Factors that hinder researchers from effectively decoding visual information and reconstructing images or videos mainly include high dimension of fMRI data, small sample size and serious noise. In the future, more advanced artificial intelligence technology should be used to develop more effective methods of neural encoding and decoding, and try to translate brain signals into images, video, voice, text and other multimedia content, so as to achieve more BCI applications. The significant research directions include 1) multi-modal neural encoding and decoding based on the union of image and text; 2) brain-guided computer vision model training and enhancement; 3) visual neural encoding and decoding based on the high efficient features of large-scale pre-trained models. In addition, since brain signals are characterized by complexity, high dimension, large individual diversity, high dynamic nature and small sample size, future research needs to combine computational neuroscience and artificial intelligence theories to develop visual neural encoding and decoding methods with higher robustness, adaptability and interpretability.

link:http://www.cjig.cn/jig/ch/reader/view_abstract.aspx?file_no=20230204