The present study combines time-frequency analysis on a sensor space level with source waveform analysis by means of magnetoencephalography (MEG) to explore the underlying neural activity behind the processing of an ABA-triplet streaming-task. We furthermore challenge the perception by contrasting four degrees of inter-tonal frequency separation and thus enabling the formation of different perceptual states in one and the same polyrhythmic structure. In order to keep sustained attention, the participants were instructed to focus on the slowest rhythm (B-tones). The results of the first two parts (presentation of asymmetric ABA-triplet sequences) revealed a clear increase of the spectral power at approximately 2 Hz that corresponds to the B-tones presentation rate in the streaming (10-semitones) and intermediate frequency separation (4-semitones) conditions. This was in line with our hypothesis. Additionally, the A-tones presentation rate elicited steady-state like activity at approximately 4 Hz. The ABA-triplet sequence used in the present study is usually heard as a galloping rhythm and the A- and B-streams are enclosed into the ABA-pattern . Hence the A- and B-tones related activities at 4 Hz and 2 Hz are only accessible in the spectrum if the two streams are segregated. Our results, therefore, likely reflect the selective segregation of the polyrhythmic ABA-pattern into two monorhythmic A- and B- streams. The activity at approximately 10 Hz and 6 Hz that corresponds to A-B and B-A-tone intervals of the ABA-triplets also increased across the trials in the first two parts. In the light of the present findings, one might speculate that the neural representation of different auditory sequences relies on neural entrainment of the temporal intervals between the composed stimuli. Therefore, when the perception is in favor of one-stream condition (0-semitones) one could capture the corresponding presentation rates in the spectrum (10 Hz and 6.6 Hz), whereas the other rhythms would be suppressed (2 Hz and 4 Hz) and vice versa in the case of segregation (10-semitones). Additionally, the time-frequency results demonstrated that the responses to the ABA-frequency distribution (approx.10 Hz and 6.6 Hz) appeared to be sustained across the entire presentation of the non-streaming condition (0-semitones), whereas the B-tone related activity (2 Hz) emerged at approximately 0.5 s and reached its maxima at approx. 0.8 s and 2 s only during the streaming condition (10-semitones). Conversely, the spectral power at 10 Hz and 6.6 Hz was rather transient in all other conditions that allowed perceptual streaming (2-, 4- and 10-semitones). The streaming phenomenon is cumulative  and needs variable amount of time to build-up  and therefore the appearance of the 2 Hz activity at about 0.5 s in the time-frequency plots likely reflects the streaming built-up period. Alongside this, the vanishing of the activity at approx. 10 Hz and 6 Hz could match the periods wherein the perception alternated in favor of stream segregation. Indeed, the statistical analysis revealed that the spectral power corresponding to the A-B and B-A time intervals of the ABA-triplets is significantly enhanced compared to the responses tuned to the separated A- and B-tones in the non-streaming scenario (0-semitones) and the conditions of small and intermediate inter-tonal frequency separations (2- and 4-semitones).
The statistical analysis showed furthermore that the steady-state activity related to the attended B-stream (2 Hz) increased significantly with enlarging the inter-tonal frequency difference between A- and B-tones (from 0- to 10-semitones). This result lends further support to the idea that attention is a crucial factor in auditory streaming because it biases the auditory system towards particular grouping or binding of sound-source elements in favor of the listener’s intention [19, 21]. A previous study by Xiang and colleagues, for instance, explored the mechanisms of temporal integration and its interaction with attention in the auditory system by using a streaming paradigm with two competing tones . The authors demonstrated that focusing the listeners’ attention on one of the two competing tempi enhances significantly its steady-state power. However, the two competing tones they used could primarily produce two auditory streams , unlike the asymmetric ABA-triplets used in the present study. Furthermore, it has been demonstrated previously that the steady-state responses could be modulated by attention [35, 36]. Our experimental design, therefore, allowed us to explore the interaction between the temporal rates in one integrated polyrhythmic pattern and two segregated monorhythmic streams in one and the same tone-sequence. On the other hand, our results revealed a higher spectral power tuned to the A-tones presentation rate (4 Hz) in comparison with the B-tones related responses (2 Hz) in the cases of intermediate and small frequency separation between tones, although the attention was focused on the B-rhythm. It might be suggested that in cases of small frequency differences between tones, such as those used in the second part (2- and 4-semitones), the perception of the B-tone is not able to dominate the perception of the A-tones, and that this produces considerably higher activity at approximately 4 Hz target frequency. It could be speculated therefore, that a greater effort is needed to segregate the ABA-structure onto separate A- and B-tone streams in the cases of small and intermediate frequency differences than in the pure streaming condition (10-semitones). In addition, it might be more difficult to follow the slower B-stream (2 Hz) instead of the twice as fast A-stream (4 Hz) in the cases of intermediate and small frequency separations than in the greater frequency differences. Besides that, previous studies showed that the steady-state responses are stronger in low frequency rates (below 16 Hz) when mediated by attention [21, 52]. Although the attention was focused on the B-tones in our experiment, changing the inter-tonal frequency separation into the ABA-tone pattern revealed dissimilar efficiency of temporal integration of separate A- and B-streams. It has been demonstrated previously that the P1 and N1 components of the human AEFs are larger when listeners perceive two segregated streams than one integrated stream and this magnitude augmentation is consistent with the increasing frequency separation between the A- and B-tones . However, these authors showed that the B-tones’ related responses were always enhanced, regardless of the attended stream (A- or B-tones) . Similarly, it has been proposed that the frequency separation between different sound sources of a polyrhythmic sequence is sufficient to provide the selective processing of a particular musical instrument; however, the selective attention to one or another spatially separated element of this rhythm could additionally improve the segregation process . These findings together support the idea that the attention in auditory streaming is not merely an intrinsic mechanism that augments the neural responses but its effects are based on a specific interaction between the physical attributes of the stimuli . Additionally, the present outcome is in line with the hypothesis that distinct neuronal populations are involved in the processing of A- and B-tones and suppression of one population might underlie the stream segregation phenomenon [11, 12].
Assuming that the steady-state activity at low frequency bands is generated by the periodic appearance of the evoked components in response to the A- and B-tones, we tested whether the source waveform of the response signal triggered by the attended B-tones of the ABA-triplets represents any significant effects regarding the evoked peaks. Moreover, the modulation of the source waveforms’ components synchronized to each triplet of the ABA-streaming task is a traditional way to investigate the auditory streaming phenomenon (see e.g. ). The analysis revealed higher amplitude of the evoked components with increasing the frequency separation, a finding that is in line with prior studies [2, 6, 8, 54–56]. Specifically, the P1 evoked component to the B-tones enhanced significantly as the inter-tonal frequency difference increased. This implies that the enhancement of the evoked fields in the source space level together with the B-tones related activity derived from the time-frequency results likely reflect the selective segregation of the attended B-stream. However, the source-wave forms comprise more than one harmonics in the spectrum and it is thus difficult to separate the streaming-related effects from the activities related to the physical features of the sounds. Elhilali and colleagues, for instance, demonstrated that frequency-distant spectral components are no longer heard as separate streams if presented synchronously rather than consecutively, while the neural activity increases with increasing frequency separation between tones .
Hence, the auditory evoked fields per se are not capable of fully explaining the perception of streaming.
In apparent contrast to the first two parts, two recurring A- and B-tone-streams were presented in the third part. Here, the temporal distribution between the A-B and B-A-tones was always different, whereas the presentation rates of the A-tones and the B-tones per se, were always regular, corresponding to 8 Hz and 4 Hz, respectively. The results demonstrated clear non-attentive steady-state activity at approx. 8 Hz and 4 Hz. Indeed, it has been shown that the auditory system prefers regular arrangements [39, 40]. Moreover, the integration of auditory streams, based on their regularities, could take place automatically. The mismatch negativity component (MMN) of event-related potentials, for instance, automatically detects changes in the regular stimulus pattern [57–60]. Additionally, it has been found that the MMN operates also on the basis of auditory objects and that the integration of objects occurs pre-attentively in the auditory system . The experimental design applied in the third part could not provide two complementary percepts (integrated vs. segregated), such as ABA-triplets. It could be speculated, therefore, that the two auditory streams were formed of the very first moment of their presentation. On the other hand, it has recently been demonstrated that stream-integration can occur with irregular arrangements ; however, it is likely that in the absence of active awareness, the auditory system integrates tone patterns based on their physical regularities.
In summary, the present findings suggest that neural encoding of a streaming task relies on an oscillatory entrainment of the stimulus presentation rates. However, two separate effects of the time-frequency data must be distinguished: the first is represented in our results by the distribution of the intervals between the A-B (10 Hz) and B-A (6.6 Hz) tones of the ABA-triplets (0-semitones). The second effect is represented by the 2 Hz and 4 Hz steady-state responses related to the B- and A-tones derived from the conditions that allow perceptual streaming (2-, 4- and 10-semitones), alongside the steady-state effects of non-attentive listening (part 3). The present effects cannot be directly ascribed to the underlying mechanisms responsible for various perceptual states, because the participants were not required to make streaming judgments during the trials. Nevertheless, these effects might be grounded to physiological hallmarks of the process, which precedes the formation of one vs. two streams percept. Hence, further study is necessary to show the differences in the spectral distribution of identical tonal-frequency separation in conditions of perceptual validation during integration vs. segregation.