In this study, we report the results of a coordinate-based ALE meta-analysis of the brain activation during phonemic and semantic verbal fluency tasks in healthy volunteers. The main clusters of brain activation were seen in the left frontal lobe, specifically the IFG, MFG and medial frontal gyrus (BA 6, 9, 44, 45, 47), as well as in the anterior cingulate gyrus (ACC) (BA 24, 33). These results confirm previous studies suggesting that brain regions primarily in the left prefrontal gyrus, particularly in the LIFG and LMFG, are involved in word production and speech processing in verbal fluency tasks [4, 55–60]. Regarding the ACC, phonemic verbal fluency tasks predominantly activated the left (BA 32, 24) and right ACC (BA 32), semantic verbal fluency tasks only the left ACC (BA 32). This is in line with previous studies suggesting that the cingulate gyrus (BA 32, 24) is activated during word generation and may therefore reflect the attentional demands of verbal fluency tasks [43, 55, 60]. Patients with bilateral anterior cerebral infarction for example often suffer from akinetic mutism and verbal fluency deficits. Furthermore, blood flow in the anterior cingulate gyrus (BA 24) increases during the processing of single words or letters .
The left parietal precuneus (BA 7) was activated in the processing of phonemic and semantic fluency tasks. The precuneus (BA 7) is involved in phonemic discrimination and working memory [56, 57, 62, 63] and was repeatedly associated with the processing of phonological information. Furthermore, this region plays a central role in visual attention of stimuli and speech.
Further cluster of activation included the left and right insula, left Thalamus and Putamen as well as the right Claustrum and Caudate Head. Another cluster of activation was seen in the cerebellum. There is evidence that the (left) sub-lobar insula is involved in speech processing and the execution of verbal fluency tasks . Specifically the left anterior insula has been suggested to be involved in the articulatory planning of orofacial movements . A systematic review of Price reports that speech production leads to an increased activation in the cerebellum, the anterior insula as well as in the left Putamen . The ACC and head of caudate have been found to be involved in word selection. The initiation and execution of movements during speech production increase the activation in the left putamen. The thalamus has also been shown to be involved in the processing of verbal fluency .
Brain activation in the processing of phonemic versus semantic verbal fluency tasks
As can be seen in Tables 3 and 4, Brodman area 44 was only involved in the processing of phonemic verbal fluency tasks, whereas BA 9, 45 and 47 were activated in phonemic and semantic verbal fluency tasks. The result that BA 44 was only involved in the processing of phonemic verbal fluency tasks is in line with previous studies which suggested that the posterior-dorsal LIFG (BA 44) is specifically involved in the processing of phonemic information [6, 11, 12]. Phonemic fluency is most likely triggered by subvocal syllabification that overlaps with processes of inner speech such as motor programming and articulation, as indicated by stronger activations of posterior LIFG (BA 44; Figure 2, blue) close to adjacent (pre)motor areas .
Contrary to the hypothesis that the anterior-ventral LIFG (BA 45, 47) is specifically involved in the processing of semantic information, BA 45 and 47 were activated in the processing of semantic and phonemic verbal fluency tasks. Previous studies also revealed an activation of BA 45 and 47 in the processing of phonemic and semantic verbal fluency tasks [3, 37, 42]. These results are consistent with assumptions that phonological search processes are not exclusively based on phonemic information, but may also rely on semantic facilitation . A variety of previous studies failed to find evidence for the hypothesis that semantic processing preferentially activates anterior ventro-lateral regions of the PFC when compared to phonological processing [13, 69–72]. A recent study directly comparing phonemic vs. semantic verbal fluency tasks while controlling for the effects of task demand implies that activity in the anterior-ventral LIFG (BA 45) is mainly related to task demand and individual ability . In summary, our results support the hypothesis that the posterior LIFG is specialized for the use of phonemic material but failed to confirm the hypothesis that the anterior LIFG is specifically involved in the processing of semantic information.
The subtraction analysis revealed no cluster of significantly greater activation during the processing of phonemic than during semantic verbal fluency tasks. Previous studies suggested that caution should be exercised carrying out formal comparisons of ALE meta-analyses when the two data sets are disparate in the total number of foci. In these cases, it is impossible to say with any certainty whether the difference maps reflect activation difference across groups of studies or simply show the effect of one group having a greater number of coordinates . In order to improve the sensitivity of the subtraction analysis and increase the number of foci included in the semantic verbal fluency map, we added the coordinates of the semantic verbal fluency tasks included in the six studies which investigated both phonemic and semantic verbal fluency to the semantic part of the subtraction analysis. Thus, data set A (“phonemic”) of the subtraction analysis included 15 studies, data set B (“semantic”) 13 studies. In our previous analysis, the subtraction analysis revealed a greater activation during phonemic than semantic verbal fluency tasks in a cluster in the left LIFG. Due to the inclusion of two additional semantic verbal fluency studies in the analyses of the revised manuscript, the subtraction analysis did not longer reveal different activation patterns in phonemic compared to semantic verbal fluency tasks. The observed difference of activation in the previous analysis may be the result of power differences between the two tasks, because the number of the included studies and foci in the phonemic verbal fluency task was significantly higher than in the semantic verbal fluency task. The fact that the clusters of activation in the left hemisphere coincided in phonemic (Table 3) and semantic (Table 4) verbal fluency tasks except for BA 24 (ACC) and 7 (Putamen) substantiate this assumption.
Previously, a domain specific activation in the left posterior temporal cortex near the middle temporal gyrus for semantic processing was found [9, 11, 12, 60]. In the current meta-analysis, activation of the left temporal gyrus in the processing of semantic verbal fluency tasks as previously reported [10–12, 43] could not be replicated. This may be due to the lower overall activation in semantic verbal fluency tasks and the lower number of semantic verbal fluency studies included in the meta-analysis.
The studies included in our meta-analysis differed regarding design, methodology, and the study population. As shown in Tables 1 and 2, the included studies differ in their stimulus material, baseline condition and language (English, German, Dutch or Japanese) as well as in the kind of stimulus presentation (auditory versus visually) and response generation (overt versus covert). These differences might have affected the results of our analysis.
The included experiments used two different types of baseline conditions. Twenty-two of the 28 experiments (15 of 21 phonemic and 2 of 13 semantic fluency experiments) involved a covert or overt repetition of a given word (“rest”) or of a familiar sequence (e.g., forward counting, days of the week or month of the year). The performance of such standardized language production requires at least some low-level phonologic processing. When subtracted from the experimental tasks, they would at its best attenuate phonologic activity in the final images, which is most likely localized to the more posterior and dorsal areas of the LIFG [6, 73]. Consequently, most phonemic experiments may have underestimated the extent of phonologic activity. The second type of baseline condition was a passive task, such as silent rest or visual fixation of a cross or symbol. There is some evidence that a functionally connected brain network including the LIFG might be associated with resting states [74–76]. In this framework, the effects would be opposite to those using a standardized language production baseline task. Semantic activity would then be underestimated in semantic experiments, which used a resting state (two of thirteen semantic experiments) baseline condition.
The performance in verbal fluency tasks depends on the difficulty of the stimulus material. As can be seen in Table 2, the majority of English and German phonemic verbal fluency studies used variations of the COWAT stimulus material (FAS) additional to further letters. The use of different stimulus material could have affected our results because previous studies suggested that the CFL subtest of the COWAT is more difficult than the FAS subtest . Lacy and colleagues  on the other hand revealed a comparable performance in the two forms of the COWAT. Borowski and colleagues  investigated the association between different letters and their difficulty. Based on the frequency of the generated words, the authors categorized H, D, M, W, A, B, F, P, T, C, S as easy English letters, I, O, N, E, G, L, R as moderately difficult letters and Q, J, V, Y, K, and U as hard letters. According to this classification, the included studies of our meta-analysis only used easy to moderate letters. The most frequently used categories were furniture (6 of 13), animals (5 of 13) as well as fruits, food, body parts (4 of 13) and vegetables, cloths and colors (3 of 13).
The majority of the participants in the current meta-analysis were native English or German speakers, one study consisted of Dutch participants, one of Japanese individuals. The fact that different languages use different strategies for encoding grammatically information leads to the question whether an unitary network of brain regions specialized for processing grammar in a broad sense is involved in the processing of different human languages, or whether different languages impose distinct processing demands relying on non-identical neural mechanisms. Previous studies on language dependent processing of verbal material suggest that different brain networks are involved in the processing of different languages [80, 81]. The language of the included studies might accordingly have affected the activation patterns. However, a secondary analysis excluding the Dutch and Japanese studies revealed the same results than the first analysis. Furthermore, Oberg and Ramirez  suggested that as long as the letter frequency was considered, the number of generated words were remarkably similar across different languages.
Subjects generated covert responses in 7 of 21 phonemic studies and 6 of 13 semantic, verbal fluency tasks. Whereas overt paradigms hold a risk to produce movement artifacts, covert verbal responses do not allow to determine whether the subjects perform the task as instructed and to assess the task performance . Because of the differences between covert and overt verbal fluency paradigms, it seems to be difficult to generalize the results from covert response paradigms to overt response paradigms. Furthermore, it is possible that the cognitive processes operating during covert verbal responding are different in some aspects to those operating during overt verbal responding. Although a direct comparison of overt and covert responses in a stem completion task showed greater LIFG activation with overt than covert responses, the location of the peak of activation did not differ . In order to clarify the effect of the response generation on the brain activation, we statistically compared the two sets of foci by subtracting the ALE maps of the overt and covert verbal fluency tasks. The covert fluency data set included 14 experiments reporting 122 foci, the overt data set 17 experiments reporting 255 foci and the pooled data set 31 experiments yielding 377 foci. The subtraction of the covert versus overt verbal fluency maps resulted in a higher activation likelihood in the LIFG (BA 46; X: -52; Y: 27; Z: 17; 1816 mm3). No significant differences were seen subtracting the overt from covert verbal fluency map. The auditory presented fluency data set included 18 experiments reporting 184 foci, the visually presented data set 13 experiments reporting 178 foci and the pooled data set 31 experiments yielding 362 foci. Auditory presentation of the stimuli resulted in a significantly greater activation in the left medial frontal gyrus (BA 8; 312 mm3) and left Insula (BA 13, 160 mm3).
Regarding gender differences in the performance of verbal fluency tasks, previous studies revealed heterogeneous results. Among the functional imaging verbal fluency studies focusing on sex differences, no study has highlighted a statistically significant behavioral difference between groups of men and women. No activation difference was found for men and women selected either on the basis of a same high level of VF performance or on differential cognitive performances (5; 60). Furthermore, a variety of non-imaging studies investigating the verbal fluency performance of healthy subjects also did not found differences between men and women in their phonemic or semantic verbal fluency performance [84–87]. On the other hand, Gauthier and colleagues  reported sex effects in a sample of high performers in seven cortical structures (the left ITG, anterior and posterior cingulate, right ACC, SFG, dlPFC and lingual gyrus) during the processing of a phonemic verbal fluency paradigm. The majority of the included studies in our meta-analysis investigated the brain activation during verbal fluency tasks in a sample of men and women without consideration of gender (Tables 1 and 2). Thus, we were not able to identify the activation patterns separately for men and women.
In conclusion, the aim of our meta-analysis was to compare the brain activation during the processing of semantic and phonemic verbal fluency. Tables 1 and 2 show that the number of studies using German or English language, a visually or auditory presentation of the stimulus material or an overt or covert paradigm was comparable between the studies investigating phonemic or semantic verbal fluency. Thus, we would suggest that the effect of these confounding variables on brain activity was equally distributed in phonemic and semantic verbal fluency tasks. Nevertheless, future studies are needed which investigate the brain activation during verbal fluency tasks separately for studies using different designs with respect to stimulus presentation, language or response generation, respectively. The number of included studies in our meta-analysis, specifically in the analysis of semantic verbal fluency tasks, was too small to compare subgroups of studies presenting the stimulus material visually or auditory or using overt or covert paradigms.
Coordinate-based neuroimaging meta-analyses usually pool studies that have different statistic thresholds. As can be seen in Tables 1 and 2, the statistical threshold of the individual studies of our meta-analysis ranges from strict family-wise error rate correction to uncorrected p-values of p < 0.005. Assuming equality across studies irrespective of their statistical threshold could have the consequence of giving more weight to studies using less strict statistical thresholds, as these are likely to report more significant findings than studies using stricter statistical thresholds . In our meta-analysis, six studies used an uncorrected statistical threshold. Additional file 1: Table S1 and S2 show the number of foci from each experiment contributing to the significant clusters of phonemic and semantic verbal fluency. With 9 and 5 foci, respectively, the study of Abrahams and colleagues  contributed by far the most foci to cluster 1 and 2 of phonemic verbal fluency. However, in the other five studies with uncorrected statistical thresholds, it could not be observed that they report more significant findings than the studies using stricter statistical thresholds.
A further limitation of our meta-analysis might be that the power of the analyses cannot be aggregated across the included studies, because the GingerALE software is not suited to correct for false negative results . On the other hand, this also means that ALE minimizes the risk of false positive results and is not susceptible for outlier effects. Meta-analytic results are often influenced by the heterogeneity of the included studies. Therefore, it is an aim of meta-analysis to statistically control for potential sources of heterogeneity. The ALE software did not allow the investigation of heterogeneity between the individual studies; therefore, we cannot fully exclude that the results might be influenced by a possible heterogeneity of the individual studies. Nevertheless, we tried to minimize the heterogeneity through the definition of relatively strict inclusion criteria. Furthermore, the new ALE algorithm is based on a random effects model, which is more conservative than the fixed-effects model and incorporates both within-study and between study variance.