The results of the current study suggest that the accuracy of performance on the modified Concept Shifting Task (mCST) is robust to the effects of practice that could arise from a repeated completion of this test over the average of eight days, twice a day (before and after a short break). Specifically, the effect sizes (both Hedges' g and Cohen's d) were small (all <.24) when comparing the accuracy on the last vs. first test performance and after vs. before the session break on all experimental days. Therefore, the accuracy on the mCST is a useful measure for monitoring the frontal-cortical cognitive functioning over time.
On the other hand, practice appeared to contribute to faster completion of the test at the end compared to the beginning of the experiment. The resulting effect sizes of this comparison were moderate (absolute values of g and d of .75 and .70 respectively). Similarly, the participants were faster after vs. before the session break on all testing days but the effect sizes of this comparison were small (absolute values of both g and d < .37). Therefore, even though the accuracy of performance on the mCST was robust against effects of practice, the duration of performance improved with time, especially in the long-term (on the last vs. the first mCST trial). Since studies of practice effects typically focus on task accuracy only [for example see , the effects of practice on the time to complete a test remain relatively unreported. However, this might be of clinical relevance. For instance, depression appears to be associated with a reduced speed of information processing  and thus it would be expected that depression patients would be slower at completing the task than healthy controls. The current results suggest that it might be difficult to evaluate an improvement in the speed of performance on the mCST following a certain treatment in depression because such an improvement could be confounded by practice effects.
The improvement in the duration of performance on the mCST is likely to be due to the effect of practice and learning in that participants were familiar with the general framework of the experiment, including the instruction change half way through each experimental session and the type of stimuli used (letters or numbers). In general, the presence of practice effects in serial testing is associated with activation of various cognitive mechanisms, such as executive functions and memory . Both, short-term and procedural memory are likely to be associated with the short-and the long-term improvements in duration on the mCST based on the finding that practice effects are consistently observed on neurocognitive tests of memory [4–6]. Consequently, the absence of practice effects on the duration of performance on the mCST could indicate potential deficits in cognitive functioning  and thus deem the test suitable for clinical applications.
Another reason for the improvement in the duration of performance on the mCST could be a shift in the bias of the participants' trade-off between accuracy and speed of performance. Specifically, at the beginning of the experiment, participants may be more biased towards accuracy, whereas after multiple testing, the bias may be shifted towards completing the task faster. However, against this argument is the finding that the higher accuracy was significantly correlated with the faster performance (lower duration) in the current study (Figure 2) suggesting that participants were either fast and accurate or slow and inaccurate on the mCST. The problems with changing trade-offs between accuracy and speed in psychological tests may occur less frequently in clinical samples, where the tests may be perceived as being more relevant to own health of the participants. Specifically, participants in clinical settings may maintain motivation to engage with the task, as it has direct and personal relevancy. Thus, if the change in speed of performance is due to trade-offs in motivation, it is not clear if the findings of an improvement in speed observed in healthy controls would generalise to patient samples. However, if the speed improvement is due to practice effects, then patients should also show an increase in speed over time regardless of their motivation to complete the task.
In task-switching paradigms, task-switching leads to slowing-down (increasing) of RTs [7, 8]. Since the RTs were not measured in the current study it can only be speculated that despite the overall decrease in the duration of performance over time, the RTs (the time taken to cross out the first letter or number) might have increased due to the participants either needing to endogenously reconfigure the task set or experience the exogenous interference from earlier instructions.
Therefore, a decrease of duration on the mCST might have been indirectly due to both exogenous and endogenous processes affecting the RTs to the task stimuli in the current study. In other words, it can be speculated that participants might have taken longer to execute their first response (for instance, identify the letter or the number to be crossed out first on each trial) but afterwards completed the trials faster. Alternatively, the current participants completed the mCST faster with time because they could reconfigure the task set quicker and/or because there was less interference from new instructions presumably from learning the procedural aspects of the task towards the end of experiment. Given that it was possible for participants to predict the next instruction (crossing of letters or numbers in either ascending or descending order) but not the composition of the next trial (which letters or numbers would be presented and in what precise position) offers an explanation for why improvement was found only for duration but not for accuracy of performance on the mCST.
The lack of practice effects on the accuracy of performance in the current study might be due to employing test forms (or computer screens) with varying types and locations of stimuli. It has been shown that, compared to identical test forms, alternate forms reduce practice effects on some memory tests . Furthermore, practice effects are also moderated by task difficulty . It appears that the current task was difficult to learn, since it was impossible to memorise the precise location and type of stimuli on each trial, which changed with each trial. Therefore, even though it may not be possible to completely remove the effects of practice from serial psychological testing, the current mCST task appears to be resistant to practice in terms of performance accuracy in healthy participants. Therefore, the task might be of use to assess treatment effects over time in clinical samples.
One of the limitations of the current study is the method of meta-analysis used. As stated in the Results section, the current study used the random-effects model due to the assumption that the five studies used in the analysis differed from each other methodologically and thus would most likely not share one common effect size. However, when the number of studies used in meta-analysis is low then the estimation of the between-study variance is compromised . In this case, the results of meta-analysis should not be generalised to a wider population. One way of dealing with this problem would be to conclude that the mean weighted effect size describes the studies in the current analysis only or even to refrain from computing a mean weighted effect size at all . This solution appears problematic for at least two reasons. Firstly, there is no consensus on what constitutes a 'small' number of studies in meta-analysis. Typically, in clinical research, a large number of studies located for a purpose of meta-analysis (for instance, N>2000) is drastically reduced to N < 10 mainly due to the inability to extract adequate information from such sources. Thus, it is not uncommon to perform meta-analysis on as few as five primary studies [for example, see . Secondly, even though not reported here, the results of the fixed-effect meta-analysis conducted on the current five studies produced the same results as the random-effects model in terms of similar mean weighted effect sizes, 95%CIs and p -values. Furthermore, the effect sizes in the five primary studies were similar to each other and to the mean weighted effect size and the results of meta-analysis were in agreement with the classical statistical analysis (ANOVA) performed on the data. Therefore, regardless of the method of meta-analysis, it appears that the mean weighted effect sizes accurately describe at least the current primary studies, especially for the accuracy data (Figure 3A and 3B).
Even though it contains a small number of primary studies, the strength of the current study is that all the studies employing the mCST to date were included in the analysis and that primary data from all five studies were available to the authors, which is uncommon in a typical meta-analysis. Therefore, the results of the current meta-analysis are not affected by a publication bias in terms of not including all available studies on the topic due to a limited search strategy, biasing the search for primary studies to the English language only (the Tower of Babel Error), and not including non-significant primary results which may not have been published (the File-Drawer Problem) .
Another limitation of the current work is that it was impossible to investigate practice effects on the mCST in more detail due to the limited amount of data available on this new task. For instance, other studies utilising meta-analysis of practice effects in other psychological tasks compared subgroups of studies or performed moderator analyses to find out what factors might contribute to practice effects, particularly if heterogeneity among studies was detected [14, 15]. In general, factors such as age, gender, and education were found to affect the performance on the original CST task  and thus the mCST should also be administered to larger samples to test for factors other than practice that can affect performance on this task.
Practice effects in psychological tests also depend on the number of repetitions of the task and the temporal proximity of repetitions . The five primary studies analysed in this article utilised the mCST for the average of eight days only which might have been too short for the effects of practice to occur. On the other hand, this relatively short administration period was adequate for the participants to show practice-related improvement in duration of performance. It would be of interest to test the effects of practice on this task over a longer period of time, such as one year. So far, preliminary evidence from Study 1 that continued on for 20 days suggests that accuracy of performance did not improve over time (the last vs. the first mCST trials) while participants completed the task faster over time (unpublished data). Therefore, preliminary data support the overall results collected over eight days and suggest that the mCST is prone to practice effects in terms of duration of performance but not the accuracy of performance on up to 20 days of testing. An experimental design utilising the mCST over a number of months would be better comparable to clinical protocols which may require patients to complete the mCST over a longer period of time and less frequently than daily to investigate the long-term effectiveness of some treatment.