 Methodology article
 Open Access
 Published:
Technical considerations of a gametheoretical approach for lesion symptom mapping
BMC Neurosciencevolume 17, Article number: 40 (2016)
Abstract
Background
Various strategies have been used for inferring brain functions from stroke lesions. We explored a new mathematical approach based on game theory, the socalled multiperturbation Shapley value analysis (MSA), to assess causal function localizations and interactions from multiple perturbation data. We applied MSA to a dataset composed of lesion patterns of 148 acute stroke patients and their National Institutes of Health Stroke Scale (NIHSS) scores, to systematically investigate the influence of different parameter settings on the outcomes of the approach. Specifically, we investigated aspects of MSA methodology including the choice of the predictor algorithm (typology and kernel functions), training dataset (original versus binary), as well as the influence of lesion thresholds. We assessed the suitability of MSA for processing real clinical lesion data and established the central parameters for this analysis.
Results
We derived general recommendations for the analysis of clinical datasets by MSA and showed that, for the studied dataset, the best approach was to use a linearkernel support vector machine predictor, trained with a binary training dataset, where the binarization was implemented through a median threshold of lesion size for each region. We demonstrated that the results obtained with different MSA variants lead to almost identical results as the basic MSA.
Conclusions
MSA is a feasible approach for the multivariate lesion analysis of clinical stroke data. Informed choices need to be made to set parameters that may affect the analysis outcome.
Background
Lesion analysis reveals causal contributions of brain regions to mental functions, aiding the understanding of normal brain function and rehabilitation of braindamaged patients. Historically, brain lesions were one of the few available sources of information by which functions of the human brain could be inferred. Although lesion inferences have made an enormous contribution to understanding the human brain and have laid the basis for attributing mental functions to specific brain regions [e.g. 1], they also have several drawbacks, such as the difficulty to infer function on the basis of the behavior of individual patients, the principal assumption of the localization of function, as well as the plasticity of the injured brain [2]. Today, a broad range of additional techniques exist to investigate the functions of the living brain through the correlation of behavior and cognition with brain activity, as shown by electrophysiology and functional imaging. In this context, lesion inferences, which link behavioral functions directly and causally to a neural substrate, still have an important role in modern neuroscience [2].
Despite the exciting promise of lesion inferences for identifying causal functional contributions, they are limited by several conceptual difficulties. Young et al. [3] discussed the potential unreliability of classical inference methods, such as single and double dissociations. In a single dissociation, a lesion of brain structure a disrupts function A but not function B; suggesting that functions A and B have some independence. Double dissociations arise when function A is disturbed by lesion of a and not of region b, while function B is disturbed by lesioning b and not a. This observation also leads to the conclusion of independent functions A and B, and their attribution to lesioned regions a and b, respectively [4]. The principal problem of single dissociations is the impossibility to assess if an apparently specific deficit arises from the impairment of a specific, localized process or from more general lesion impairment. In fact, a behavioral deficit after a lesion could be evidence for an interdependent hierarchy of functions in which the lesioned region plays a contributing role, rather than evidence for a localization of the function. Although double dissociations appear to represent a conceptual improvement over single dissociations in correctly ascribing functions to brain regions, they can also result in incorrect conclusions. Striking examples in this respect are socalled paradoxical lesion effects, such as the reversal of visual hemineglect during bilateral cortical or collicular inactivation in the cat brain [5]. An extensive variety of paradoxical lesion effects has been documented [6, 7], including two major types of paradoxical functional facilitation (PFF): restorative PFF and enhancing PFF. Restorative PFF arises when damage to intact brain tissue returns a previously subnormal level of functioning back to normal. One example is the Sprague effect [8], where superior collicular lesions can result in a (partial) restoration of visual functioning following an initial visual cortical lesion of the contralateral hemisphere. By contrast, an enhancing PFF occurs when a subject with apparent nervous system pathology or sensory loss performs better than healthy control subjects on a particular task. These paradoxical effects arise because the brain is not just a collection of independent functional processors, but a complex system in which brain function emerges from the multiple interactions of distributed yet interlinked brain regions. Therefore, the conceptual problems linked to single or double dissociations also extend to higherorder inferences (e.g., triple dissociation), and it becomes apparent that, in a strict sense, the true causal contributions of brain regions to behavior can only be correctly inferred from evaluating all combinations of intact and injured brain regions together with their behavioral scores.
In this context, a number of traditional strategies as well as current approaches for lesion inference, all computed on a voxelbyvoxel basis, were reviewed by Rorden and Karnath [2]; such as voxelbased morphometry (VBM) [9, 10], BrainVox [11], voxelbased lesionsymptom mapping (VLSM) [12], and voxelbased analysis of lesions (VAL) [13]. Kinkingnéhun et al. [14] introduced a method called anatomoclinical overlapping maps (AnaCOM) which uses, in contrast to tradition statistical approaches for voxelwise lesion behavior mapping (LBM, [15]), neurologically healthy individuals as control population instead of data from neurological patients [14, 15]. All statistical approaches mentioned above have, as a main drawback, the need to control for false positives [16]. Recently, Smith et al. [17] introduced a new approach, called multivariate pattern analysis (MVPA), to predict the presence or absence of spatial neglect from brain injury maps. Specifically, the authors used two machinelearning techniques, based on linear and nonlinear support vector machines (SVMs), to classify individuals based on structural brain scans with right hemisphere lesions. A recent study of ischemic stroke patients by Forkert et al. [18] demonstrated that information about stroke location (specifically, lesion overlap measurements) can improve the prediction of functional outcome (as measured by the modified Rankin Scale) by multiclass SVMs.
Principally, lesion inference approaches can be classified on the basis of the univariate or multivariate nature of the used method. Multivariate analysis methods account for inherent dependencies among regions of interest. Such dependencies may lead to substantial misinferences of univariate analysis methods, as demonstrated by Mah et al. [19] through groundtruth lesion simulations. The majority of lesion mapping approaches to date has applied univariate regression models [20]. Multivariate approaches based on machine learning, such as MVPA, were introduced to lesion analyses by the work of Smith et al. [17]. Similarly, Zhang et al. [21] developed a multivariate lesion symptom mapping approach using a machine learningbased multivariate regression algorithm. The authors showed that, in comparison with VLSM, the new approach has higher sensitivity for identifying the lesionbehavior relations, both on synthetic and real datasets. A recent study by Corbetta et al. [22] proposed a multivariate approach based on machine learning to examine lesionbehavior relationships across multiple domains in a large sample of patients.
In contrast to these approaches founded on machine learning, multiperturbation Shapley value analysis (MSA) [23] has been suggested as an alternative inference approach based on game theory [24]. MSA represents a mathematical method to assess causal function localization from multiple perturbation data. It considers brain regions as network elements or players that interact in a game to achieve a particular behavior. The MSA computes the contributions of these elements and also their interactions from a dataset of multiple lesions. The MSA approach has been applied to neuroscience [25, 26], biochemistry, and genetics [27, 28]. In the context of clinical lesion analysis, Kaufman et al. [29] applied MSA to lesion data and line bisection test scores of 23 right hemisphere stroke patients. The study focused on 11 grey and white matter regions and used a predictor (specifically, a knearest neighbor algorithm) trained on the patient injury data to obtain the line bisection performance for all injury configurations. The approach revealed that among the 11 regions of interest, only four (the supramarginal and angular gyri of the inferior parietal lobule, the superior parietal lobule, the anterior part of the temporoparietal junction, and the thalamus) had a pivotal contributing role for the given task. While this proofofprinciple paper demonstrated the practical applicability of MSA to clinical image data, it was based on a small sample of 23 cases and lowresolution CT images. In a recent paper by our group [24], we applied the MSA approach, in comparison with other lesion inference methods, to a large multicentre dataset of stroke patient data, to investigate functional contributions of eight largescale bilateral volumes of interest (VOIs). The dataset comprised lesion patterns of 148 acute stroke patients together with their neurological deficits, assessed by the National Institutes of Health Stroke Scale (NIHSS, [30]). The analysis, which revealed contributions to essential behavioral and cognitive functions particularly by subcortical structures, contributed to the interpretation of NIHSS in clinical practice as well as clinical trials.
MSA is a novel computational method, which has not yet been explored in sufficient detail from a technical perspective for the application to stroke lesion inference. In the present study, we applied MSA to the same large sample of patient lesion data used previously [24], to investigate open methodological and technical aspects of the MSA approach in lesion inference and systematically determine the influence of different parameter settings on the results of this approach. More precisely, the main goals of this study were to, first, compute a sensitivity analysis on the parameters characterizing the preparation of the dataset for MSA, that is, lesion definition and lesion prediction, second, identify the most important parameters for this analysis, third, investigate MSA methodological variants and, finally, assess the suitability of MSA for processing real clinical lesion data.
Methods
Behavioral and lesion image data
For the present study, we used a large multicentre dataset of stroke patients, described in a recent paper by our group [24]. The used data comprise a population of acute stroke patients included in a multicentre observational study, designed to analyze the combined use of FLAIR (fluid attenuated inversion recovery MR imaging) and DWI (diffusionweighted MR imaging) for identifying patients with acute ischemic stroke within 4.5 h of symptom onset (the PREFLAIR study [31, 32]). All patients in this study were studied within 12 h of witnessed stroke onset and severity of neurologic deficit on admission was assessed using the global NIHSS. The NIHSS, which is a rating scale resulting from a standardized neurological examination, quantifies symptom severity in acute stroke [30] by scoring 11 items representing specific abilities, with scores ranging between 0 (no symptoms, correct performance of task) and 2–4 (maximum symptom severity for corresponding item). These items include the: Level of Consciousness, Horizontal Eye Movement, Visual field, Facial Palsy, Motor Arm, Motor Leg, Limb Ataxia, Sensory, Language (Aphasia), Dysarthria, Extinction and Inattention. Higher NIHSS scores indicate a more severe impairment. A global score is calculated by summing up the individual score values. Details of imaging parameters and clinical characteristics for this study cohort were described previously [24].
Lesion image processing
For quantitative lesion analysis, the same processing pipeline was used as described in [24]. Briefly, the lesions were segmented in each DWI dataset using a semiautomatic intensitybased method. After lesion definition, the 152 MNI brain atlas [33] was registered to each DWI dataset using a surfacebased registration method. The resulting transformation was then used to transform the corresponding MNI atlas brain regions to each DWI dataset, which were then used for lesion overlap quantification (in %). The study focused on eight bilateral VOIs, defined by the MNI structural atlas: caudate (CAU), insula (IN), frontal (FR), occipital (OCC), parietal (PAR) and temporal lobes (TEM), as well as putamen (PUT), and thalamus (TH), which covered the whole brain. The overlap (in %) between each of the transformed 16 anatomical structural regions as defined in the MNI structural atlas and the patientspecific acute ischemic stroke lesion was calculated for each patient. The resulting dataset was composed of 148 patient cases with different patterns of lesioned VOIs (77 leftonly, 72 rightonly, one patient without lesions across all VOIs who was included in both hemisphere sets) and the corresponding global NIHSS values of the patients.
Preliminary analysis
As in Zavaglia et al. [24], we first employed two simple approaches of lesion inference to assess the relative lesion size and frequency, using Lesion Overlap and Median VOI Lesion Overlap. The first method simply calculates the percentage of patients who display a lesion in each voxel of the MNI brain atlas (in %). The second method is similar to the first one, but is applied to VOIs instead of single voxels. This approach shows the median overlap between the 16 anatomical MNI brain atlas regions and the patientspecific acute ischemic stroke lesions. For further details see Zavaglia et al. [24].
Multiperturbation Shapley value analysis: general approach
As an alternative to traditional inference approaches, the MSA is a rigorous method for assessing causal function localization from multiple perturbation data. It addresses the challenge of defining and calculating the contributions of network elements from a dataset of multiple lesions (multiple perturbation experiments) and their performance scores. It objectively quantifies not only the contributions of network elements, but also their interactions. MSA is based on coalitional game theory [34]. In general, the linked system elements (in this study, the volumes of interest) can be seen as players in a game, and a perturbation configuration represents a subset of elements, which are perturbed concomitantly. The set of elements, which are left intact, represents the coalition of players. For each configuration, the performance of the system (here, the inverse of the NIHSS), which can be seen as the worth of the coalition, is measured. The aim of the analysis is to assign values that represent the elements’ contribution, or importance, for the overall function. The contribution of an element represents the worth of the coalition which contains the element, in relation to the worth of coalitions that do not contain it. The formal procedure is described below.
If we consider a system composed of N = {1, … , n} elements that perform a task, we can define a coalition S, where S ⊆ N, and a performance score v(S), which is a real number representing the performance measured for the perturbation configuration in which all the elements in S are intact and the rest perturbed. The definite value in game theory and economics for this type of coalition game is the Shapley value [34]. The marginal importance of player i to a coalition S, with i ∉ S, is represented in Eq. 1
The Shapley value of each player i ∈ N is defined by Eq. 2 where \({{\mathcal{R}}}\) is the set of all n! orderings of N and S _{ i }(R) is the set of players preceding i in the ordering R. If we assume that all the players are arranged in some order, all orders being equally likely, the Shapley value can be interpreted as the marginal importance of a player i to the set of players that precede him. We define a configuration to be an indicator vector for the set of unperturbed elements, that is a binary vector of length n, with c _{ i } = 1 if i ∈ S or c _{ i } = 0 if i ∉ S. For a detailed, formal description of MSA see [23].
When all possible 2^{n} perturbation configurations are known, the Shapley value can be computed either using Eq. 2 where the summation runs over all n! ordering of N, or it can be computed as a summation over all 2^{n} configurations, properly weighted by the number of possible ordering of the elements (Full Information MSA).
Multiperturbation Shapley value analysis: method variants
Predicted MSA
Frequently, the complete set of performance scores for all combinations of the binary states of a set of regions required for MSA is not available, due to the difficulty of experimentally accessing all perturbation configurations. In those cases, a prediction model trained on the available set of configurations and performance scores can be used to predict the performance scores corresponding to all binary configurations.
In this study, a total of 2^{n} = 256 binary lesion configurations existed (where each VOI can be lesioned, “0”, or intact, “1”, and n = 8 is the number of VOIs for each hemisphere) and correspondingly, 256 performance scores were required for the MSA (see “Multiperturbation Shapley value analysis: general approach” section). However, the original input dataset used in this work was composed of only 77 graded lesion configurations (describing % lesion overlap) and corresponding performance scores for the eight left VOIs and 72 graded lesion configurations and corresponding performance scores for the eight right VOIs. Therefore, we used a machine learning model trained on the available input dataset (see “Multiperturbation Shapley value analysis: prediction of unknown performance scores” section for details) to predict the performance scores of all possible 2^{n} = 256 binary lesion configurations. After the prediction, all performance scores corresponding to the required 2^{n} binary configurations for each hemisphere were available, and it was possible to compute the Shapley value with the full information MSA (predicted MSA).
Estimated MSA
In studies where the number of system elements is too large to enumerate all configurations in a straightforward manner, the MSA can alternatively sample orderings and calculate an unbiased estimator of the contribution of each element (estimated MSA) as shown in Eq. 3, where \({\hat{\mathcal{R}}}\) represents a randomly sampled set of permutations.
It is important to note that a perturbation configuration may appear in different permutations. Therefore, the number of new configurations (and corresponding performance scores) for each sampled permutation decreases as more permutations are sampled [26]. The multiperturbation configurations that are used in the estimated Shapley value method depend on the sampled orderings. However, the available dataset of performance measures for some set of perturbation configurations does not necessarily match the ones corresponding to a random permutationconfiguration sample. In this case, as for the predicted MSA, a prediction model is trained using the available given set of perturbation configurations and corresponding performance scores and it is used to predict the performances for any perturbation configuration generated by the sampled permutations (estimated MSA). In studies where the number of VOIs (n) is relatively small, there is no considerable advantage in using the estimated MSA, but where the number of VOIs is much larger, its use becomes fundamental, because the number of configurations and corresponding performance scores generated by the sampled permutations is much smaller than 2^{n}. The estimated MSA approach is suitable for large networks consisting of up to 100 network elements [26].
Multiperturbation Shapley value analysis: prediction of unknown performance scores
Choice of machine learning predictor
There is no prespecified prediction model for MSA, and it is an important task to select the bestsuited algorithm for predicting unknown performance scores on the basis of the specific configuration of available performance values. This aspect is important, because it may affect the results of the subsequent MSA analysis. Among a large number of available classifiers, one can consider, for example, naive Bayes classifiers, regression trees, nearest neighbor algorithms, random forests, and support vector machines (SVM).
The knearest neighbor (kNN) approach has been used before for a similar purpose by Kaufman et al. [29], who applied it in their MSA study on spatial neglect patients. The kNN method is a relatively simple interpolation algorithm in which an object is assigned a value based on the classes (i.e., functional performance score) of its k nearest neighbors, for instance, based on Euclidean distance. Alternatively, support vector machines have been found to be very powerful, especially in case of nonlinear classification problems. Support vector machines can be applied to classification or regression problems [35]. Specifically, a supervised learning algorithm infers a function from labeled training data consisting of a set of training examples. The inferred function can be used for mapping new examples; that is, the algorithm can determine the class labels for unseen instances. In the present approach, we predicted performance scores space using the multiclass SVM (where the number of categories is larger than two) implemented in LIBSVM [36]. A sensitivity analysis on the settings of the SVM (i.e., its kernel function) was performed to select the bestsuited parameters for our dataset.
Originalgraded versus thresholdedbinary dataset
Two options were investigated to predict the performance of the 256 binary configurations, using either the originalgraded dataset or alternatively a thresholdedbinary dataset (where “0” is lesioned and “1” is intact, created from the graded dataset after thresholding) as training dataset. Both strategies have their inherent advantages and drawbacks. On the one hand, using the originalgraded dataset utilizes the information of the original data in the training without information loss due to binarization. However, the training and test data do not have the same features, since the predictor is trained with graded data and then tested with binary data (256 binary lesion configurations required by the MSA). On the other hand, using a binary dataset (after thresholding) for training, the type of training and test data is identical, at the cost of major drawbacks: the necessary choice of an arbitrary threshold for the binarization, and the consequent loss of information. Particularly, after binarization, the number of unique configurations changes, because some graded configurations become equal to each other at a binary level, but have different associated behavioral scores and also different graded lesion patterns.
The binarization threshold may be global, that is, using a unique value for all VOIs, or individual, that is, computed individually for each VOI. In our study, we explored both approaches. As a first alternative, we binarized the originalgraded dataset at different global thresholds by defining each VOI as lesioned (“0”) or intact (“1”) depending on whether relative lesion size was larger or smaller than a given global threshold. Using a leaveoneout cross validation scheme, we trained the SVM respectively with the originalgraded dataset graded from 0 to 1, where “0” represents a complete lesion and “1” a completely intact region (since the original overlap values are defined opposite we had to calculate values as 1—originalgraded dataset), and with the corresponding thresholdedbinary dataset at different thresholds. Each SVM model was evaluated with leaveoneout cross validation, using the thresholdedbinary dataset obtained with different thresholds. In this way, for each binarization threshold and for both types of training dataset (originalgraded or thresholdedbinary), we computed the root mean squared error (RMSE) in terms of the difference between the real and the predicted performance score. We focused on the SVM variants with the lowest RMSE for both types of training. In turn, the same analysis was repeated for three individual thresholds, computed respectively as the first (0.25), second (0.5, median threshold) and third (0.75) quartile of the nonzero relative lesion size individually for each (individual) VOI. For the individual thresholds, we also computed a measure of accuracy. Specifically, we considered the prediction successful when the absolute value of the difference between the predicted and the real performance score was not larger than a maximum tolerance error (=3).
Results
Lesion definition
Lesion size and NIHSS
Figure 1 represents the relative lesion size and the associated global NIHSS together with color bars indicating the range of variation in lesion size and NIHSS, respectively. The NIHSS is graded from 0 to 21, where zero means that the patient shows no behavioral deficit and a score of 21 indicates the severest impairment (the range is given for the current sample; higher NIHSS scores are possible). The figure demonstrates the segregation into left and righthemispheric lesions and the correlation of lesion size with NIHSS. Large structures, such as the cortical lobes, typically only have small relative lesions, while relative lesion sizes are larger for small (subcortical) structures.
Lesion overlap and median VOI lesion overlap
Figure 2 shows the outcomes of the preliminary analysis approaches described in “Preliminary analysis” section, applied to the left and right lesions dataset separately and represented in the reference space of the MNI atlas (top row). The first, simple and straightforward assessment of the data is by the relative Lesion Overlap, shown in the second row on the MNI brain atlas (using neurological convention). This representation shows that all VOIs involved in the study are damaged to different extent. Due to these overlapping lesion patterns, it is difficult to establish a simple relationship between the lesions and the behavioral scores (NIHSS). As a second method to visualize the lesioned regions, we used Median VOI Lesion Overlap, Fig. 2, third row, showing the relative (percentage) infarction computed individually for each VOI. This method clearly indicates that, on a relative scale, the subcortical regions are most affected, especially in the right hemisphere. In Fig. 3 for each VOI in the left and right hemisphere, we also plotted the relative percentage of lesioned voxels for all patients with the corresponding median VOI infarction.
Lesion prediction
Originalgraded versus thresholdedbinary dataset and linear versus RBFkernel SVM: global threshold of binarization
In order to assess the drawbacks and advantages of the two types of training sets, we present both prediction options with the corresponding results. In Fig. 4, we show the changes of RMSEs for left and right VOIs separately, obtained with thresholdedbinary and originalgraded training for two SVM classifiers, depending on the global threshold of binarization. Specifically we used a linear kernel (kernel type t = 0) SVM, with parameters set to default value (SVM type s = 0, cost c = 1), and a radial basis function (RBF) kernel (kernel type t = 2), with all parameters set to default value except for the gamma parameter (SVM type s = 0, cost c = 1, gamma g = 1). In panel e and f of Fig. 4, the number of unique configurations after the binarization process is illustrated. It becomes apparent that it is not straightforward to select a global threshold of binarization that gives the best results for both types of trainings, particularly because it is not sufficient to consider the RMSE, as also the numbers of unique (useful) configurations after binarization have to be taken into account (as anticipated in “Originalgraded versus thresholdedbinary dataset” section). A threshold of 50 %, for example, yields a good RMSE, but, at the same time, only 4 and 6 unique configurations are available for the left and right hemisphere, respectively, since at this threshold, only bilateral putamen and insula have lesioned elements. In this context, an areaspecific threshold for binarization, specifically and separately tailored for each VOI despite the variation in relative lesion size, could represent a good alternative.
Originalgraded versus thresholdedbinary dataset: individual threshold of binarization
Table 1 shows the results on the RMSE and accuracy (maximum tolerance error = 3) computed with individual thresholds, for left and rightdamaged patients datasets respectively, with both types of training and with the linearkernel SVM (for brevity, we did not report the results of the RBFkernel SVM). The median value threshold represents a good compromise between RMSE and accuracy, for both graded and binary training, as well as the number of useful configurations. Moreover, the classification accuracies obtained with the median threshold are considerably higher than the statistical chance levels, which were computed in the same way as prediction accuracy (maximum tolerance error ±3), but instead of the predicted scores we used NIHSS values that were randomly permutated for all patients. We repeated this procedure 100 times and obtained a mean chance level accuracy of 36 % for left regions and 34 % for right regions.
Training of the SVM using the originalgraded values instead of the thresholdedbinary values intuitively promises a better prediction, due to making fuller use of the continuously defined variables. Indeed, RMSE and accuracy for graded training were slightly better than for binary training. However, this procedure incurs the problem that the training data and the predicted data are of different types (graded versus binary lesions), which leads to biases in the prediction. Specifically, the prediction of NIHSS based on graded lesion patterns led to a discontinuous spread of predicted values (cf. Fig. 7 as well as Figs. 5 and 6). While the RMSE and accuracy measures suggested that these values were within a useful range, the apparent artificial distribution of the values indicated a prediction bias. For this reason, for subsequent analyses we preferentially used a thresholdedbinary dataset for the training of a linearkernel SVM, where the binarization was implemented through a median threshold for each VOI. Figure 7 shows predicted performance scores obtained with thresholdedbinary and originalgraded training. Specifically, we represent the 256 binary configurations required for the MSA [sorted from all lesioned (blue = 0) to all intact (red = 1)], with the corresponding mean value of the performance scores. The performance scores were predicted with the linearkernel SVM, for left and right VOIs, based on both thresholdedbinary and originalgraded dataset training and the leaveoneout cross validation. The color map range of predicted scores in the color bars is the same for panels (a) and (b). It is interesting to note that there is a substantial difference between left and right brain regions. For right regions, the predicted scores tend to vary in a smaller range compared to left ones. This observation can be ascribed to the fact that the scores are based on a battery of functions mostly associated with the left hemisphere. Moreover, it is clear that the binary training, for both hemispheres, leads to a more even spread of predicted scores, while with the originalgraded training, the predicted scores tend to exhibit only a few values of the possible global NIHSS range, and the correlation between the number of intact regions and the performance score is less evident.
MSA contributions
MSA: general consideration
In this section, we present the main results obtained with the MSA analysis applied separately to left and right VOIs. Specifically, we show the differences between contribution values obtained with thresholdedbinary and originalgraded training, between SVM classifiers (radial basis function or linear kernel), and between MSA variants (predicted MSA and estimated MSA).
Predicted MSA: originalgraded versus thresholdedbinary dataset and linear versus RBFkernel SVM
Here, we show the normalized mean MSA contribution values for the inverse NIHSS, using the linear and the RBFkernel SVM, with thresholdedbinary and originalgraded training datasets, to compute unknown performance scores (predicted MSA). As left and righthemispheric lesions were strictly separated in the present patient sample, contributions of VOIs were computed separately for the left and right hemisphere. Standard deviation bars are derived from the leaveoneout cross validation during the prediction of performance scores. Specifically, we predicted the 256 unknown scores 77 times (for leftbrain VOIs) and 72 times (for rightbrain VOIs) and consequently computed the same number of MSA contribution values. Positive contribution values indicate that regions contribute positively to the performance of a task. Thus, if they are lesioned, the performance is lowered. By contrast, a negative contribution value means that the lesioning of these regions is beneficial for the performance of the task. We also show the normalized contribution values obtained with the estimated MSA, for the binary training and linear kernel SVM. In Fig. 8, we show the comparison between the contributions derived from the training with the originalgraded and thresholdedbinary dataset using the linear kernel SVM, for left and right VOIs respectively. For the binary training, subcortical regions, such as caudate and left insula, together with parietal and frontal lobes, were inferred to make the strongest contributions to brain functions reflected by the NIHSS. Moreover, all contributions (except for the right temporal lobe) were significantly different from zero, with negative contributions coming from right putamen and left thalamus (same results shown in [24]). Both for left and right VOIs, there were differences between the two training methods. For right VOIs the difference is more evident, especially for putamen and insula that seem to have a much stronger effect when the originalgraded training is used.
In Fig. 9 we show the same quantities as Fig. 8, but generated using the radial basis function kernel SVM. The difference between the two training methods for the left VOIs is less evident than for right VOIs. Interestingly, the results obtained with the two SVM kernels do not differ considerably. We computed the rank correlation between the eight mean contribution values obtained with thresholdedbinary training, with linear and RBF kernel: for left VOIs ρ = 0.42 (pvalue = 0.30) and for right VOIs ρ = 0.55 (pvalue = 0.17). We also computed the rank correlation between the eight mean contribution values obtained with originalgraded training, with linear and RBF kernel: for left VOIs ρ = 0.17 (pvalue = 0.7) and for right VOIs ρ = 0.78 (pvalue = 0.03) .
Estimated MSA: thresholdedbinary dataset and linearkernel SVM
Figure 10 shows the contribution values obtained with the estimated MSA after simulations of 100 orderings of the set of multiperturbation experiments. We used the linear kernel SVM with thresholdedbinary training (individual median threshold) to predict the performance scores for all the perturbation configurations dictated by the sampled permutations. The output shows contributions which are almost identical to the contributions obtained with the predicted MSA (rank correlation is ρ = 0.98 for left VOIs and ρ = 0.95 for right VOIs, see also black contributions in Fig. 8). Also, smaller numbers of orderings (i.e., 50) work quite well (results not shown). In this study, where the number of VOIs is relatively small, there is no big advantage in using the estimated MSA, but in studies where the number of VOIs is much larger, its use becomes essential (see “Estimated MSA” section).
Discussion
In this paper, we investigated in detail the MSA methodology and showed its application to a large dataset of stroke patients. The dataset was used in a previous study by our group [24], but here we focused on clarifying methodological and technical questions of the MSA approach in lesion inference and identifying the most important parameters for the analysis. In particular, we investigated parameters involved in the preparation of the dataset for MSA, such as the binarization threshold (global or individual) for the graded lesion dataset, the kernels of the predictor and the consequent bestsuited training set for the prediction of unknown behavioral scores. We also investigated MSA methodological variants, such as the estimated MSA, that may be relevant for future studies involving more finely resolved ROIs.
Data conditioning, choice of training set and predictor
One of the main technical limitations of the classic MSA method is the preparation of a complete lesion dataset in binary format, consisting of functional scores for all possible configurations of intact and lesioned states of VOIs, as required by the algorithm. The full information MSA approach requires complete functional information for all 2^{n} binary brain state configurations, where n is the number of regions. Thus, ideally, one would have available performance values for 2^{n} patients, whose lesion patterns are all different from each other. In clinical practice, this requirement is unrealistic. Even the present extensive study, which included 77 left and 72 rightbrain damaged patients, did not reach the required number of 256 distinct cases for each hemisphere, and a predictor had to be trained to obtain the full set of performance scores corresponding to the 2^{n} possible binary lesion configurations (predicted MSA). Moreover, in addition to the choice of the predictor (see “Choice of machine learning predictor” section) an extensive study of the bestsuited training dataset is necessary. We investigated two training sets: the originalgraded dataset and the thresholdedbinary dataset. In clinical practice, lesion information at the VOI scale is not a binary category indicating if the region is entirely lesioned or intact, but is provided as a graded percentage of lesioned voxels of a region. In this context, it is important to note that using the originalgraded dataset for training utilizes the information of the original data in the training without loss due to binarization. However, the training and test data do not have the same features, since the predictor is trained with graded data and then tested with binary data (256 binary lesion configurations required for the MSA). By using a binary dataset (after thresholding) for training, the type of training and the test data are identical, but the approach involves the choice of a threshold for the binarization, as well as the potential loss of information from thresholding. Particularly, after binarization the number of unique configurations may change, because some graded configurations become equal to each other at the binary level (i.e., these configurations are collapsed into each other), but have different associated behavioral scores and also different graded lesion patterns.
Each of the previous steps has to be carefully considered, since there is no ideal, objective method for binarizing graded data, and no predefined predictor for generating missing behavioral data. In the present study, we systematically investigated the binarization threshold and the parameters of the machine learning predictor (focusing on SVMs with linear or radial basis function kernel). In order to find the best solution for the present dataset, we performed a sensitivity analysis on global and individual thresholds for binarization and compared the results of the prediction with both originalgraded training and thresholdedbinary training in terms of accuracy, computed by minimizing the error in the prediction of behavioral scores. The training of the SVM using the originalgraded values appeared to show a better prediction, due to the use of the continuously defined variables, as confirmed by RMSE and accuracy, which were slightly better than for thresholdedbinary training. However, the use of training data of different type from the testing data led to a discontinuous spread of predicted values, as shown in Fig. 7.
The results computed with MSA by means of a thresholdedbinary dataset for the training of a linear kernel SVM (threshold = median value of all nonzero percentages of lesioned voxels) showed that contributions were all significantly different from zero (with the exception of the right temporal lobe) and that subcortical regions, such as bilateral caudate, and insula, together with the parietal and frontal lobe, were inferred to make the strongest contributions to essential brain function as reflected by the NIHSS. Interestingly, MSA revealed also negative contributions, specifically from the right putamen and left thalamus (for interpretation see also [24]). The comparison of contributions value obtained with the originalgraded training instead of thresholdedbinary training yielded some differences, especially for the linear kernel SVM. Interestingly, the results obtained with the radial basis function SVM kernel showed no substantial differences to those obtained with linear SVM kernel, especially for right VOIs (see rank correlations in “Predicted MSA: originalgraded versus thresholdedbinary dataset and linear versus RBFkernel SVM” section).
Considering the differences using the two training methods, general recommendations can be derived. If the accuracies computed with originalgraded and thresholdedbinary training sets are similar and the dataset analyzed is small (i.e., composed of a small number of cases), the thresholding could cause significant loss of information due to the collapsing of configurations, and it would be preferable to choose the originalgraded set to train the predictor and obtain the full set of performance scores. However, if the dataset analyzed is large, as for the data presented here, it is feasible to choose the thresholdedbinary set as we did.
MSA variants: advantages and drawbacks
The main drawback of the classic full information MSA is the need for 2^{n} performance scores corresponding to the binary configurations (here 256). These numbers quickly increase with the number of elements of interest, requiring, for example, 1024 configurations for ten VOIs. For this reason, the application of the standard full information predicted MSA is limited to a small number of VOIs that need to be carefully selected. The number of brain regions that can be investigated properly with the standard MSA approach depends on the available sample size, but is typically limited to around ten brain regions, given the typical sample sizes of large multicentre stroke studies. Otherwise, the number of unknown lesion configurations and consequently the number of behavioral scores that need to be predicted grows too large, which also represents a considerable limitation for any machine learning technique. Frequently however, a resolution of about ten regions of interest provides a meaningful scope for the interpretation of lesion findings.
In this context we also investigated a variant of MSA, the estimated MSA, which is useful in studies where the number of system elements is too large to enumerate all configurations in a straightforward manner, such as studies focusing on many small VOIs or even single voxels. This MSA variant is computationally convenient, because it can sample orderings and calculate an unbiased estimator of the contribution of each element. MSA variants were already presented by Keinan et al. [26] where the authors focused on the analysis of large complex networks. Keinan et al. showed that the estimation and prediction variants successfully allowed the analysis of several neurocontrollers consisting of up to 100 neural elements. The contributions we obtained in the present study with estimated MSA were almost identical to the ones obtained with predicted MSA, and this result is encouraging in the perspective of analysing a larger number of more finely resolved anatomical or functional brain regions. In this context, it may eventually even be possible to apply the MSA for lesion inferences at the voxellevel. In doing so, it would no longer be necessary to use thresholds for binarization, since lesions at the voxel level produce binary states of lesioned and intact elementary nodes. However, such a feasibility analysis of maximum spatial resolution is beyond the scope of the present study and subject to future investigations.
MSA versus other multivariate lesion inference approaches
How does MSA differ to other multivariate lesion inferences? Another multivariate approach is multiarea pattern prediction (MAPP) [24], which is based on SVM and offers a way of comparing MSA and MVPA [17] strategies. While not identical to MVPA, MAPP operates in the same spirit, by computing the leaveoneout crossvalidation with n different datasets (n = number of regions), obtained respectively by removing each single region one at a time. In this way, we can measure how important each region is for the prediction procedure (i.e., by its individual contribution to the prediction error [24]). Similar to MSA, MAPP makes use of SVM in order to compute the RMSE in the leaveoneout crossvalidation procedure. Like MSA, it is also applied to the thresholdedbinary dataset with the corresponding performance scores, but in contrast to MSA, it does not require the full set of lesion configurations with associated performance scores.
It is also important to mention that controlling for total lesion volume can have a considerable impact on the lesionsymptom mapping approach [21]. In fact considering lesion size can be especially important to separate the specific effects of damage to a particular voxel from effects resulting from the generally higher damage likelihood in patients with large lesions compared to patients with small lesions. In this context Mirman et al. [37] reported similar anatomical results for univariate and multivariate approaches after accounting for lesion size.
An exhaustive numerical comparison between MSA and MAPP, or other recent multivariate approaches (i.e. SVRLSM [21]), should be based on groundtruth simulations [19], which is beyond the scope of the present project.
Conclusions
The MSA approach provides a new, principled method for the objective, multivariate computation of regional causal contributions to brain function. The approach reveals characteristic contribution patterns for behavioral and cognitive functions based on clinical scores and may provide useful guidance for rehabilitation. The method requires conditioning of the data, and we showed here that some parameters are crucial in the analysis: the threshold for binarization of graded lesion patterns, the choice of the algorithm for predicting unknown behavioral scores, and the choice of the training set. We demonstrated that there are no particular predictors or thresholds for the binarization that generally perform better than others. The choices of these settings are subjective, but we provide some general recommendations which, when considered in combination with a sensitivity analysis on these parameters, can be helpful for finding the best approach for given datasets.
In general, the results presented here are still preliminary, but indicate how MSA may allow building a matrix of causal functional contributions and provides useful guidance for rehabilitation.
Abbreviations
 MSA:

multiperturbation Shapley value analysis
 NIHSS:

National Institutes of Health Stroke Scale
 SVM:

support vector machine
 VBM:

voxelbased morphometry
 VLSM:

volumebased lesion symptom mapping
 VAL:

voxelbased analysis of lesions
 AnaCOM:

anatomoclinical overlapping maps
 LBM:

lesion behavior mapping
 MVPA:

multivariate pattern analysis
 VOI:

volume of interest
 FLAIR:

fluid attenuated inversion recovery
 DWI:

diffusion weighted imaging
 MRI:

magnetic resonance imaging
 kNN:

knearest neighbor
 RMSE:

root mean squared error
 RBF:

radial basis function
 mRS:

modified Rankin Scale
 MAPP:

multiarea pattern prediction
References
 1.
Broca P. Remarque sur le siège de la faculté du langage articulé, suivies d’une observation d’aphemie (perte de la parole). Bull Soc Anat Paris. 1861;6:330–57.
 2.
Rorden C, Karnath HO. Using human brain lesions to infer function: a relic from a past era in the fMRI age? Nat Rev Neurosci. 2004;5:813–9.
 3.
Young MP, Hilgetag CC, Scannell JW. On imputing function to structure from the behavioural effects of brain lesions. Philos Trans R Soc Lond B Ser B Biol Sci. 2000;355:147–61.
 4.
Teuber HL. Physiological psychology. Annu Rev Psychol. 1995;6:267–96.
 5.
Lomber GL, Payne BR. Removal of two halves restores the whole: reversal of visual hemineglect during bilateral cortical or collicular inactivation in the cat. Vis Neurosci. 1996;13:1143–56.
 6.
Kapur N. Paradoxical functional facilitation in brainbehaviour research. A critical review. Brain J Neurol. 1996;119:1775–90.
 7.
Kapur N. The paradoxical brain. Cambridge: Cambridge University Press; 2011.
 8.
Sprague J. Interaction of cortex and superior colliculus in mediation of visually guided behavior in the cat. Science. 1966;153:1544–7.
 9.
Wright IC, McGuire PK, Poline JB, Travere JM, Murray RM, Frith CD, Frackowiak RS, Friston KJ. A voxelbased method for the statistical analysis of gray and white matter density applied to schizophrenia. NeuroImage. 1995;2:244–52.
 10.
Ashburner J, Friston KJ. Voxelbased morphometry—the methods. NeuroImage. 2000;11:805–21.
 11.
Frank RJ, Damasio H, Grabowski TJ. Brainvox: an interactive, multimodal visualization and analysis system for neuroanatomical imaging. NeuroImage. 1997;5:13–30.
 12.
Bates E, Wilson SM, Saygin AP, Dick F, Sereno MI, Knight RT, Dronkers NF. Voxelbased lesionsymptom mapping. Nat Neurosci. 2003;6:448–50.
 13.
Rorden C, Brett M. Stereotaxic display of brain lesions. Behav Neurol. 2000;1:191–200.
 14.
Kinkingnéhun S, Volle E, PélégriniIssac M, Golmard JL, Lehéricy S, Du Boisguéheneuc F, ZhangNunes S, Sosson D, Duffau H, Samson Y, Levy R, Dubois B. A novel approach to clinicalradiological correlations: anatomoclinical overlapping maps (AnaCOM): method and validation. NeuroImage. 2007;37:1237–49.
 15.
Rorden C, Fridriksson J, Karnath HO. An evaluation of traditional and novel tools for lesion behavior mapping. NeuroImage. 2009;44:1355–62.
 16.
Yekutieli D, Benjamini Y. Resamplingbased false discovery rate controlling multiple test procedures for correlated test statistics. J Stat Plan Infer. 1999;82:171–96.
 17.
Smith DV, Clithero JA, Rorden R, Karnath HO. Decoding the anatomical network of spatial attention. Proc Natl Acad Sci USA. 2013;110:1518–23.
 18.
Forkert ND, Verleger T, Cheng B, Thomalla G, Hilgetag CC, Fiehler J. Multiclass support vector machinebased lesion mapping predicts functional outcome in ischemic stroke patients. PLoS One. 2015;10(6):e0129569. doi:10.1371/journal.pone.0129569.
 19.
Mah YH, Husain M, Rees G, Nachev P. Human brain lesion deficit inference remapped. Brain J Neurol. 2014;137:2522–31.
 20.
Karnath HO, Smith DV. The next step in modern brain lesion analysis: multivariate pattern analysis. Brain J Neurol. 2014;137:2405–7.
 21.
Zhang Y, Kimberg DY, Coslett HB, Schwartz MF, Wang Z. Multivariate lesionsymptom mapping using support vector regression. Hum Brain Mapp. 2014;35:5861–76.
 22.
Corbetta M, Ramsey L, Callejas A, Baldassarre A, Hacker CD, Siegel J, Astafiev SV, Rengachary J, Zinn K, Lang C, Connor L, Fucetola R, Strube M, Carter A, Shulman G. Common behavioral clusters and subcortical anatomy in stroke. Neuron. 2015;85:927–41.
 23.
Keinan A, Sandbank B, Hilgetag CC, Meilijson I, Ruppin E. Fair attribution of functional contribution in artificial and biological networks. Neural Comput. 2004;16:1887–915.
 24.
Zavaglia M, Forkert ND, Cheng B, Gerloff C, Thomalla G, Hilgetag CC. Mapping causal functional contributions derived from the clinical assessment of brain damage after stroke. NeuroImage Clin. 2015;9:83–94.
 25.
Keinan A, Kaufman A, Sachs N, Hilgetag CC, Ruppin E. Fair localization of function via multilesion analysis. Neuroinformatics. 2004;2:163–8.
 26.
Keinan A, Sandbank B, Hilgetag CC, Meilijson I, Ruppin E. Axiomatic scalable neurocontroller analysis via the Shapley value. Artificial Life. 2006;12:333–52.
 27.
Kaufman A, Keinan A, Meilijson I, Kupiec M, Ruppin E. Quantitative analysis of genetic and neuronal multiperturbation experiments. PLoS Comput Biol. 2005;1:e64.
 28.
Kaufman A, Kupiec A, Ruppin E. Multiknockout genetic network analysis: the Rad6 example. In: Proceedings of the 2004 IEEE computational systems bioinformatics conference (CSB). 2004. p. 1–9.
 29.
Kaufman A, Serfaty C, Deouell LY, Ruppin E, Soroker N. Multiperturbation analysis of distributed neural networks: the case of spatial neglect. Hum Brain Mapp. 2009;30:3687–95.
 30.
Brott T, Adams HP, Olinger CP, Marler JR, Barsan WG, Biller J, Spilker J, Holleran R, Eberle R, Hertzberg V. Measurements of acute cerebral infarction: a clinical examination scale. Stroke. 1989;20:864–70.
 31.
Thomalla G, Cheng B, Ebinger M, Hao Q, Tourdias T, Wu O, Kim JS, Breuer L, Singer OC, Warach S, Christensen S, Treszl A, Forkert ND, Galinovic I, Rosenkranz M, Engelhorn T, Köhrmann M, Endres M, Kang DW, Dousset V, Sorensen AG, Liebeskind DS, Fiebach JB, Fiehler J, Gerloff C. DWIFLAIR mismatch for the identification of patients with acute ischaemic stroke within 4·5 h of symptom onset (PREFLAIR): a multicentre observational study. Lancet Neurol. 2011;10:978–86.
 32.
Cheng B, Brinkmann M, Forkert ND, Treszl A, Ebinger M, Köhrmann M, Wu O, Kang DW, Liebeskind DS, Tourdias T, Singer OC, Christensen S, Luby M, Warach S, Fiehler J, Fiebach JB, Gerloff C, Thomalla G. Quantitative measurements of relative fluidattenuated inversion recovery (FLAIR) signal intensities in acute stroke for the prediction of time from symptomonset. J Cereb Blood Flow Metab. 2013;33:76–84.
 33.
Collins D, Holmes C, Peters T, Evans A. Automatic 3D model based neuroanatomical segmentation. Hum Brain Mapp. 1995;3:190–208.
 34.
Shapley LS. Stochastic games. Proc Natl Acad Sci USA. 1953;39:1095–100.
 35.
Cortes C, Vapnik V. Supportvector networks. Mach Learn. 1995;20:273–97.
 36.
Chang CC, Lin CJ. LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol. 2013;2:1–27.
 37.
Mirman D, Zhang Y, Wang Z, Coslett HB, Schwartz MF. The ins and outs of meaning: behavioral and neuroanatomical dissociation of semanticallydriven word retrieval and multimodal semantic recognition in aphasia. Neuropsychologia. 2015;76:208–19.
Authors’ contributions
MZ, NF and CCH designed the study, MZ and NF performed data analysis, BC, GT and CG contributed data, MZ, NF and CCH wrote the manuscript and all authors commented critically on the results and on the manuscript. All authors read and approved the final manuscript.
Competing interests
Christian Gerloff has received fees as a consultant or lecture fees from Bayer Vital, Boehringer Ingelheim, EBS technologies, Glaxo Smith Kline, Lundbeck, Pfizer, Sanofi Aventis, Silk Road Medical, and UCB. Götz Thomalla has received fees as a consultant or lecture fees from Covidien and Boehringer Ingelheim.
Availability of data and materials
The dataset supporting the conclusions of this article is included within the article (Fig. 1).
Ethics approval and consent to participate
PREFLAIR: The study was approved by the local ethics committees at all centres. Either written or verbal informed consent was obtained for all patients, as required by local legislation.
Funding
Research was supported by the ERANET NEURON project “BEYONDVIS”, BMBF 01EW1002, the SFB 936 “Multisite Communication in the Brain” (Projects A1, C1, C2, Z3) as well as TRR 169 “CrossModal Learning”, Project A2.
Author information
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Received
Accepted
Published
DOI
Keywords
 Brain lesions
 Multiperturbation Shapley value analysis (MSA)
 Game theory
 Lesion inference
 Functional prediction