Diagnosis of schizophrenia with functional connectome data: a graph-based convolutional neural network approach

Previous deep learning methods have not captured graph or network representations of brain structural or functional connectome data. To address this, we developed the BrainNet-Global Covariance Pooling-Attention Convolutional Neural Network (BrainNet-GA CNN) by incorporating BrainNetCNN and global covariance pooling into the self-attention mechanism. Resting-state functional magnetic resonance imaging data were obtained from 171 patients with schizophrenia spectrum disorders (SSDs) and 161 healthy controls (HCs). We conducted an ablation analysis of the proposed BrainNet-GA CNN and quantitative performance comparisons with competing methods using the nested tenfold cross validation strategy. The performance of our model was compared with competing methods. Discriminative connections were visualized using the gradient-based explanation method and compared with the results obtained using functional connectivity analysis. The BrainNet-GA CNN showed an accuracy of 83.13%, outperforming other competing methods. Among the top 10 discriminative connections, some were associated with the default mode network and auditory network. Interestingly, these regions were also significant in the functional connectivity analysis. Our findings suggest that the proposed BrainNet-GA CNN can classify patients with SSDs and HCs with higher accuracy than other models. Visualization of salient regions provides important clinical information. These results highlight the potential use of the BrainNet-GA CNN in the diagnosis of schizophrenia. Supplementary Information The online version contains supplementary material available at 10.1186/s12868-021-00682-9.


Introduction
Convolutional neural networks (CNNs) are extremely efficient architectures in image and audio recognition tasks. CNNs performed better than other DNNs in the classification of Alzheimer's disease versus mild cognitive impairment or normal controls [1,2]. We also previously reported 84.15-84.43% classification accuracies for schizophrenia (SZ) using a 3D CNN model, outperforming support vector machine (SVM) and other 3D CNN models [3]. However, a critical limitation of conventional CNNs is that receptive fields of their filters for feature extraction do not exactly capture graph or network representations of structural or functional connectome data of the brain. Recent research has shown that the representations produced by CNNs can be strengthened by integrating learning mechanisms into the network that help capture graph or network representations between features; one of these models is the BrainNetCNN [4].
The BrainNetCNN, a type of CNN, is composed of novel edge-to-edge (E2E), edge-to-node (E2N), and node-to-graph (N2G) convolutional filters that leverage the topological locality of brain network data. With structural connectome data, the BrainNetCNN framework outperformed a variety of other methods in predicting neurodevelopment [5]. Another model is a global covariance pooling (second-order pooling). Compared with global average pooling (first-order pooling) in existing deep CNNs, global covariance pooling produces covariance matrices deciphering higher order representations with the potential to enhance the nonlinear modeling capacity of deep CNNs [6]. However, a drawback of global covariance pooling is that the second-order pooling block is only applicable at the end of the network. To overcome this, Gao and colleagues proposed a novel network model introducing global second-order pooling across lower to higher layers to exploit holistic image information throughout a network [7]. With the self-attention strategy [8], high-order statistical representations can be trained at every layer, outperforming other methods [6]. Based on current trends, we hypothesized that the BrainNetCNN framework combined with global covariance pooling and self-attention model would achieve a higher performance with functional connectome data. We named this new model the Brain-Net-Global covariance pooling-Attention Convolutional Neural Network (BrainNet-GA CNN). The aims of the present study were to perform an ablation study using the BrainNet-GA CNN to analyze resting-state functional magnetic resonance imaging (rsfMRI) data and compare its accuracy in classifying SZ versus healthy controls (HCs) with those of other networks. In addition, we sought to develop an explainable saliency map showing significant connections discriminating between SZ and HCs. These connections were compared to the results of functional connectivity (FC) obtained with univariate analysis.

Demographic and clinical characteristics
The diagnoses of patients were SZ (n = 128), schizophreniform disorder (n = 40), and schizoaffective disorder (n = 3). There were no significant differences in age and sex between the SSD and HC groups. However, education was lower in the SSD group compared to the HC group (Table 1).

Ablation study on the BrainNet-GA CNN
When two convolutional layers with the Net-GA block were used, the highest accuracy obtained was 83.13%. Regardless of the number of layers, accuracy was consistently better (6-7%) in the network with the Net-GA block compared to the network without the Net-GA block (Table 2). Unexpectedly, performance was the best with one E2E layer ( Table 3). As for the hidden units of N2G in the Net-GA block, performance was slightly better with 10 units (Table 4). When the number of output dimensionality of the typical convolutional layer was 64, the best performance was observed. Minimal variations of performance were observed with different numbers  (Table 5).

Performance comparison with other competing methods
The proposed BrainNet-GA CNN showed the best accuracy (83.13%) and area under the curve (89.42%).

Discriminative connections
Regarding the connectivity strength between nodes, the top 10 discriminative connections were between the brain regions of the left posterior cingulate gyrus and right posterior cingulate gyrus; left thalamus and right thalamus; left calcarine sulcus and right cuneus; and left Heschl's gyrus and right Heschl's gyrus. The brain regions with highest nodal strength were the left calcarine sulcus, right amygdala, left putamen, right thalamus, and right supramarginal gyrus (Table 7 and Fig. 2).

Functional connectivity analysis
Compared to the HC group, the SSD group exhibited significantly increased FC between the brain regions of the cingulate gyrus and inferior frontal gyrus; left superior frontal gyrus and right inferior frontal gyrus; left angular gyrus and left inferior frontal gyrus; and right cuneus and left calcarine sulcus. Additionally, the SSD group showed decreased FC between the brain regions of the putamen and insular cortex, left Heschl's gyrus and right Heschl's gyrus and left superior temporal gyrus and, right superior temporal gyrus compared to the HC group (Table 8). Partial correlation analysis revealed significant positive relationships between the connectivity of the left anterior cingulate gyrus and left triangularis inferior frontal gyrus and positive symptom subscale, connectivity of the right anterior cingulate gyrus and left orbital inferior frontal gyrus and     negative symptom subscale and connectivity of the right cuneus and left calcarine and positive symptom subscale, general psychopathology subscale and, total score of the PANSS (Additional file 1: Table S1).

Discussion
To overcome the limitation of previous deep learning (DL) methods not capturing graph or network representations of connectome data, we developed the BrainNet-GA CNN by incorporating BrainNetCNN and global covariance pooling into the self-attention mechanism. Advancement of scientific knowledge is described below in terms of methodological and clinical aspects.

Methodological aspects
In the ablation study, favorable performance was reported in the network architecture composed of only two convolutional layers with Net-GA blocks. Relevant studies [8,9] have shown that deep convolutional networks with covariance pooling outperformed other competing methods in a large-scale visual recognition task. Unlike the results of the BrainNetCNN study [5], there was no benefit when stacking multiple E2E layers for our classification task. It seems that the features transformed by multiple E2E layers have a negative effect on extracting higher order features of the next E2N layer, which was not used in their original study [5]. Although ten units performed slightly better, the overall difference was small. The best accuracy was obtained with the BrainNet-GA CNN, compared with other competing methods. Two characteristics of our model may have contributed to this superiority. First, because second-order pooling in the E2N layer captures higher order representations, this may lead to more discriminative features. The covariance matrix produced by second-order pooling is known to improve representation power by quadratic modeling. The i-th row can be interpreted to indicate statistical  dependency between the i-th brain region and all other brain regions. We believe that this technique is a proper approach for neuroimaging data in which correlations or network features between brain regions are crucial. This may be partially supported by the finding that performance dropped a little when second-order pooling was not used in our model. Second, adopting the self-attention mechanism may enhance classification performance by effectively learning graph-wise high-order representations at every convolutional layer and recalibrating filter responses. Because the self-attention mechanism allows covariance pooling to be conveniently plugged into any location of the convolutional layers, it helps build deep CNNs. Unlike the BrainNetCNN, our model is flexible and has a deep structure. This may have contributed to the capture of richer statistics of deep features and improvement of the representation and generalization abilities of deep CNNs. Although implementation of a typical convolutional layer before the Net-GA block was necessary to apply the self-attention mechanism, it may The brighter color is, the greater its importance in the classification; Small circle in sky blue represents nodal strength. The more circle is filled, the greater its importance in the classification; § Red and blue lines represent hyperconnectivity and hypoconnectivity, respectively and darker line means more higher value be criticized that features extracted from this convolutional layer may not contain graph or network representations of our connectome data. However, we regarded this convolutional layer as a local sparse feature extractor since a typical convolution operation with small kernels does not significantly distort the topological characteristic of the correlation matrix.

Clinical implications
Using multivariate DL techniques, neuroimaging-based single-subject prediction of psychiatric disorders has gained increasing attention in recent years. Several studies have employed DL methods to classify SZ and HCs.
Using sMRI data, two studies applied a DBN to the original pre-processed images and obtained accuracy rates of 91% and 73.6%, respectively [10,11]. Applying SAE with weight sparsity control to rsfMRI data, classification of SZ vs. and controls with an accuracy of 85.5% was reported [12]. Other studies with rsfMRI data reported accuracy rates of 79-92% in SZ using autoencoder-based two-or three-stage architecture [13][14][15]. FC analysis with rsfMRI data produces a correlation matrix representing intercorrelation between voxels or regions. However, previous DL methods used in the classification of SZ and HCs did not capture inter-correlation features of the brain network. For example, DNN with weight sparsity control requires a 1D input feature vector, thereby losing spatial information between voxels or regions. Although the CNN used in our previous work [3] can preserve spatial locality with the use of 3D data, this model did not capture topological locality of the brain network. The accuracy of our model, BrainNet-GA CNN, was the best in the classification of SSDs and HCs, outperforming Brain-NetCNN by 6.09%. This suggests our proposed model is an optimally designed approach to capture inter-correlation features of functional connectome data.
Using the gradient-based explanation method, we identified discriminative functional connections between the brain regions contributing significantly to the recognition of SSDs. Among the top 10 discriminative connections, some regions (posterior cingulate gyrus and angular gyrus) were related to the DMN and others to the auditory network (superior temporal gyrus and Heschl's gyrus) and thalamus network. The DMN is involved in complex self-referential stimuli, such as mental time travel, perspective taking, and theory of mind [16]. Accumulating evidence suggests that the DMN is abnormal in SZ, although the results have been mixed [8]. It is of interest that the best discriminative connectivity was an interhemispheric connection in the posterior cingulate gyrus. The posterior cingulate cortex (PCC), a key node in the DMN, has a central role in supporting internallydirected cognition showing increased activity when individuals retrieve autobiographical memories or plan for the future [17]. Increases and decreases of resting FC around the PCC have been reported in both patients and their first-degree relatives [18,19]. In SZ, the auditory cortex is closely associated with auditory verbal hallucinations, which have been proposed to be a result of abnormally elevated resting-state activity in the auditory cortex or from the DMN [20]. The second most discriminative connection was an interhemispheric connection in the thalamus. The thalamus represents an essential hub for cognitive processes and an interface between sensory and motor systems. Brain-wide analysis of FC in SZ revealed that thalamic-related aberrant connectivities were prominent at the chronic stage of SZ [21]. In the task-based fMRI data, the most-significant, stable and discriminative FC changes involved increased correlations between thalamus and other cortical regions [22]. Interestingly, some of these regions (cingulate gyrus, superior temporal gyrus, Heschl's gyrus and cuneus) were also found to be significant in the FC analysis.
Overall pattern of the FC analysis was a more widespread occurrence of increased connectivities in patients with SZ compared with HCs. This seems in contrast with the results of previous studies that global/average connectivity strength was significantly reduced in SZ compared to controls [23,24]. However, it should be noted that there are many other studies reporting increased connectivities in the resting-state DMN [25], thalamosensorimotor link [26] and computational modeling [27]. The most prominent aberrant connections in SZ were between the cingulate and inferior frontal gyri. Our findings are partially supported by the results of Li et al. study [21]. That in patients with first episode SZ, 90% of the FC changes involved the frontal lobes, mostly the inferior frontal gyrus whereas the PCC was one of the areas showing the most prominent changes in chronic SZ. Interestingly, we found positive relationship between the connectivity strength of anterior cingulate gyrus with inferior frontal gyrus and positive or negative symptoms. The anterior cingulate cortex (ACC) is known to be involved in the affect regulation, conflict monitoring and executive control of cognition [28]. The inferior frontal gyrus is involved in attention control and response inhibition [29]. Therefore, it may be speculated that aberrant connectivity in the ACC and inferior frontal gyrus affects their functioning which may in turn lead to development of positive or negative symptoms in SZ. We observed decreased connectivities in the Heschl's gyrus and superior temporal gyrus. Similarly, Venkataraman and colleagues reported decreased connectivity between the temporal cortices bilaterally in SZ [30]. However, no significant correlation was found between these hypoconnectivity and psychopathology. Lastly, increased connectivity between the cuneus and calcarine sulcus was shown in individuals with SZ compared to HCs. The cuneus (Brodmann area 17) receives visual information from the same-sided superior quadrantic retina and is primarily involved in basic visual processing. The calcarine fissure is a deep sulcus located on the medial surface of the occipital lobe. Multiple lines of evidence indicate that there are reduced intrinsic visual cortical connectivity [31] and decreased connection in high-visual network which was found to be correlated with the severity of positive symptoms in SZ [32]. Thus, our findings on the connectivity plus its correlation with psychopathology suggest that impaired visual networks may also contribute to the development of psychopathology in SZ. On the other hand, while medial, superior, and inferior frontal gyri were found to be significant in the FC analysis, these regions were not identified as such in the gradient-based explanation method. In addition, bilateral connections between the same regions were highly prominent in the gradient-based explanation method, whereas unilateral or bilateral connections between different regions were more common in the FC analysis. These discrepancies may be due to the different methodologies used in the two analyses.
This study has several limitations. First, because the number of subjects used for the training and test phases was small, it is unclear how well these findings will generalize to different samples. Validation experiments will also be necessary if the transfer classification model is applied to a clinical population at a new imaging site. Second, although the proportion of patients with an antipsychotic naïve or free state was approximately 33%, most of the patients were on medication at the time of scanning. Antipsychotics are known to affect FC [33], this factor should be controlled if possible. Despite these caveats, this is the first study to apply a graphical approach based on the CNN to functional connectome data in SSDs. Overall, the BrainNet-GA CNN showed high accuracy in the classification of SSDs and HCs, outperforming other competing methods. Some of the discriminative connections were associated with DMN and auditory network brain regions. Furthermore, some of the discriminative connections identified by DL and conventional univariate methods were similar. These results highlight a potential use of the BrainNet-GA CNN in the diagnosis of SZ.

Participants
All participating patients (n = 171) met DSM-IV-TR criteria for schizophrenia spectrum disorders (SSDs) according to the Structured Clinical Interview for DSM-IV (SCID) [34,35]. Individuals with alcohol-or drug-use disorders within the past 6 months, intellectual disability (IQ ≤ 70), current or historical neurological disorders, pregnancy, and claustrophobia were excluded from the study. HCs were required to have no previous or current psychiatric disorders, neurological disorders, or significant medical conditions.

Clinical assessment
The severity of symptoms was evaluated within a week of fMRI scanning using the positive and negative syndrome scale [36] and, the Calgary Depression Scale for Schizophrenia 7 . The PANSS and CDSS were administered by trained psychiatrists.

Image acquisition and preprocessing
Resting-state functional and structural MRI (rsfMRI and sMRI) data were obtained at the Jeonbuk National University Hospital on a 3 T Verio scanner (Siemens Magnetom Verio, Erlangen, Germany) using a 12-channel standard quadrature head coil. We collected a 5-min resting-state scan consisting of 150 contiguous echo-planar imaging functional images (TR: 2000 ms; TE: 30 ms; flip angle: 90°; FOV: 240 mm; image matrix: 64 × 64 mm; voxel size = 1.0 × 1.0 × 1.0 mm [3]; 176 slices). MRI data preprocessing was conducted in a standard way using the Statistical Parametric Mapping software package, ver 12. The criteria for excessive head motion were translation > 2 mm or rotation > 2° in any direction. Participants for whom more than 10% of volumes showed excessive head motion were excluded from the analysis. The linear trend was removed through the time course, and the band-pass filter (0.008 < f < 0.09 Hz) was applied.

Functional connectivity analysis
Time series of the voxels within the ROI were averaged to generate the regional time series for the automated anatomical labeling (AAL) atlas. The FC matrix was computed by correlating time series data between every pair in the AAL atlas using the CONN toolbox. Group comparison was performed using ANCOVA with education as covariate. For the contrast map, we applied the cluster-level extent threshold of p < 0.01, which was corrected for multiple comparisons using the false discovery rate (Additional file 2). Partial correlations were carried out controlling for age, sex, education, duration of illness, chlorpromazine equivalent doses and head motion (framewise displacement) on the relationship between the rsFC z-values of brain regions showing significant between-group differences and PANSS. The significance level was set at a cluster-level of p < 0.05, and data were not corrected for multiple comparison because of the exploratory nature of the evaluation.

Net-global covariance pooling-attention block
The brain FC can be expressed as the complete graph G = (E, B), where B is a set of nodes reflecting 116 brain regions and E is a weighted adjacency matrix of edges. To capture graph representations of a functional connectome, we adopted graph-wise convolutional filters in the BrainNetCNN [5], which were composed of E2E, E2N, and N2G. However, the block was modified by applying two more methods, i.e., second-order pooling and squeeze-excitation network as a self-attention model, and was named the Net-Global Covariance Pooling-Attention (Net-GA) block (Fig. 3).
Unlike the BrainNetCNN [5] second-order pooling was inserted before the row convolutional filter in the E2N layer. To this end, the 3D output tensor x ee ∈ R h×w×c ′ of the E2E layer was reshaped to the two-dimensional matrix F re : x ee → x ee ′ ∈ R h×M where the i-th row indicates the representations of i-th brain regions. Given the matrix X ee ′ consisting of M-samples and h-dimensional features, the sample covariance matrix of X ee ′ can be written as follows: (1) where I is the M × M . identity matrix, J represents the M-dimensional vector, which is composed of one, and T denotes the matrix transpose. We performed a row-wise group convolutional filter by considering each row as a group, F rconv : x cov → x en ∈ R h×1 , to maintain characteristics of the functional connectome data. Through the proposed E2N layer, the input tensor from the E2E layer was transformed into region-wise sparse representations corresponding to the number of brain regions, which can be defined as follows: The Net-GA block is a computational module that can build the enhanced tensor X ∈ R h×w×c from its original tensor X ∈ R h×w×c , which can be defined as foows: where F SE = F E2N • F E2E is the squeeze function and F EX = F N2G denotes the excitation function. For the squeeze operation, an input tensor was fed to the E2E (3) F GBCP = F EX • F SE : X →X layer to encode the edge strengths over a pair of brain regions. To decrease the computational cost of secondorder pooling at the following layer, we also reduced the number of channels from c to c ′ , and the E2E layer of the proposed squeeze operation is defined as follows: For the excitation operation, we employed the N2G layer. This aims to summarize the responses of all brain regions into a single response. In the N2G layer, the dimensionality of input vector x en was decreased by passing the bottleneck layer with a reduced ratio. We then increased the vector from the reduced size to its original size, and activated the output vector using the sigmoid function, F fc : x en → x ng ∈ R h×1 . The final enhanced tensor X computed by the proposed excitation operation can be obtained by (4) F E2E : X → x ee ∈ R h×w×c ′ (5) F N2G = F rmul : X, x ng →X ∈ R h×w×c where F rmul denotes a row-wise multiplication between the input tensor X = x 1 , x 2 , . . . , x h ∈ R w×c and the weight vector x ng = [a 1 , a 2 , . . . , a h ] . The output tensor X was highlighted, helping to boost representation discriminability.

BrainNet-GA CNN architecture
The BrainNet-GA CNN consists of a typical convolutional layer, Net-GA block, and fully connected classification layer (Fig. 3). For a detailed description, see the Additional file 1.

Experiments
We conducted an ablation analysis on the proposed BrainNet-GA CNN and quantitative performance comparisons with competing methods using the nested tenfold cross validation strategy. To avoid possible bias caused by the random dataset partitioning, the crossvalidation was repeated 10 times independently, and the average score was reported. Hyperparameters such as varying regularization factors, weight decay, and network architecture, were empirically tuned and optimized. We optimized two important hyperparameters, an initial learning rate and a weighting factor of L1 regularization, using Bayesian optimization. The performance was evaluated using accuracy, sensitivity, and specificity. Also, we plotted the receiver operating characteristic curve of BrainNet-GA CNN and other competing methods, including SVM, fully connected neural network, CNN, squeeze and excitation network (SENet) [8], and Brain-NetCNN [5]. For a detailed description, see the Additional file 1.

Discriminative connections
To discover discriminative functional connections between the brain regions that make significant contributions to the recognition of SDDs, we used the gradientbased explanation method [37]. To obtain an explainable saliency map, after choosing a target class (SSDs or HCs), we fed validation data to the explanation method, and entire saliency maps were linearly integrated and normalized. Connectivity strength between the nodes and nodal strength (sum of edge weights attached to a node within a network) were estimated.