Neural mechanisms of interstimulus interval-dependent responses in the primary auditory cortex of awake cats

Background Primary auditory cortex (AI) neurons show qualitatively distinct response features to successive acoustic signals depending on the inter-stimulus intervals (ISI). Such ISI-dependent AI responses are believed to underlie, at least partially, categorical perception of click trains (elemental vs. fused quality) and stop consonant-vowel syllables (eg.,/da/-/ta/continuum). Methods Single unit recordings were conducted on 116 AI neurons in awake cats. Rectangular clicks were presented either alone (single click paradigm) or in a train fashion with variable ISI (2–480 ms) (click-train paradigm). Response features of AI neurons were quantified as a function of ISI: one measure was related to the degree of stimulus locking (temporal modulation transfer function [tMTF]) and another measure was based on firing rate (rate modulation transfer function [rMTF]). An additional modeling study was performed to gain insight into neurophysiological bases of the observed responses. Results In the click-train paradigm, the majority of the AI neurons ("synchronization type"; n = 72) showed stimulus-locking responses at long ISIs. The shorter cutoff ISI for stimulus-locking responses was on average ~30 ms and was level tolerant in accordance with the perceptual boundary of click trains and of consonant-vowel syllables. The shape of tMTF of those neurons was either band-pass or low-pass. The single click paradigm revealed, at maximum, four response periods in the following order: 1st excitation, 1st suppression, 2nd excitation then 2nd suppression. The 1st excitation and 1st suppression was found exclusively in the synchronization type, implying that the temporal interplay between excitation and suppression underlies stimulus-locking responses. Among these neurons, those showing the 2nd suppression had band-pass tMTF whereas those with low-pass tMTF never showed the 2nd suppression, implying that tMTF shape is mediated through the 2nd suppression. The recovery time course of excitability suggested the involvement of short-term plasticity. The observed phenomena were well captured by a single cell model which incorporated AMPA, GABAA, NMDA and GABAB receptors as well as short-term plasticity of thalamocortical synaptic connections. Conclusion Overall, it was suggested that ISI-dependent responses of the majority of AI neurons are configured through the temporal interplay of excitation and suppression (inhibition) along with short-term plasticity.


Background
The perceptual quality of successive acoustic signals considerably varies depending on the inter-stimulus intervals (ISI). For example, when click signals are repetitively presented at ISI ≥ ~30 ms, individual signals are clearly heard as discrete events [1]; at ISI ≤ ~30 ms, those are perceptually fused together [2,3]. This ISI boundary, denominated as "temporal-order threshold," has long been considered as an important indicator of temporal resolving capacity of the auditory system (reviewed by [4]). Another example of ISI-dependent perception is categorical perception of stop consonant-vowel syllables (CV syllables): if ISI between the consonant release and voicing onset (voice onset time [VOT]) is shorter than a critical value (VOT boundary), the consonant is perceived as "voiced"; if ISI exceeds this value, the consonant is perceived as "unvoiced" (eg.,/da/-/ta/continuum [5]). In many languages including English, the VOT boundary lies at 20-40 ms with some variance among place of articulation (reviewed by [6]). Monkeys [7], chinchillas [8] and birds [9] all place the VOT boundary at approximately the same value, indicating the categorical perception of CV syllables does not necessarily arise from a specific human speech mechanism but is based, at least partially, on general properties of the auditory system.
Case studies of patients with stroke lesions restricted to the bilateral primary auditory cortex (AI) reported that (1) their temporal-order threshold was elongated up to ~100 ms [10,11] and (2) they were severally impaired in the categorical perception of CV syllables [11,12]. These findings suggest that AI is critically involved in ISI-dependent differential perception regardless of whether the signals are phonetic or non-phonetic (reviewed by [13]).
The previous single unit study in un-anesthetized animals AI revealed that click trains produce qualitatively distinct response features depending on ISI: at ISI ≥ ~30 ms, stimulus-locking responses dominate; at ISI ≤ ~30 ms, responses occur only at the onset of the train [14]. Similar finding was obtained for AI responses to CV syllables: at ISI (VOT) ≥ ~30 ms, stimulus-locking responses take place to both the consonant and vowel; at ISI (VOT) ≤ ~30 ms, responses occur only to the consonant [15]. Since these neurophysiological ISI boundaries (~30 ms) match both the temporal-order threshold and VOT boundary (see above), it was suggested that the neural processes constraining AI stimulus-locking responses are also responsible for the perceptual boundaries of phonetic/nonphonetic acoustic signals [13].
The present study, by employing a single unit recording technique in un-anesthetized cats, thoroughly analyses how AI neurons respond to click trains of variable ISI. Then, by modeling the observed responses, we extract general principles governing various ISI-dependent behaviors of AI neurons especially stimulus-locking responses.

Response Features in the Click-train Paradigm
The results are based on 116 AI neurons that showed statistically significant excitatory responses to the click stimuli (see Methods). We classified those neurons into 2 types depending on whether they had the capacity for stimulus-locking responses to click trains (synchronization type: n = 72) or not (non-synchronization type: n = 44).
As exemplified in Figure 1A, the majority of synchronization type neurons (n = 46) exhibited 4 qualitatively distinct response patterns (regions α-δ) depending on ISI. In region α (ISI ≥ 200 ms, A-1), only the onset response was evident. In region β (ISI: 38-200 ms) spikes clearly timelocked to individual clicks: temporal modulation transfer function (tMTF; see Methods) exceeded the statistically significant level (P < 0.05; A-2, dotted line). Hereafter, we call this response pattern "stimulus-locking responses." In region γ (ISI: 16-38 ms, A-1), spikes intermittently occurred without stimulus locking (A-2). The driven rate measured 50-500 ms after the onset of the train (rate modulation transfer function: rMTF; see Methods) exceeded the threshold for excitation (A-3, dotted line). In region δ (ISI ≤ 16 ms, A-1), the onset response was followed by an unresponsive period. Since region β of this subset was bordered with regions α and γ, we regarded tMTF shape as "band-pass" (A-2; summarized in Fig. 2A-). As exemplified in Figure 1B, the remaining 26 synchronization neurons exhibited 3 response regions. Since region β of this subset was bordered with only region γ(B-1), we regarded tMTF shape as "low-pass" (B-2; summarized in Fig. 2B-). Regardless of tMTF shape, the border between regions β and γ (hereafter, β-γ border), in other words, the shorter cutoff ISI for stimulus-locking responses, lay at on average ~30 ms (Table 1). This value is in line with the previous single unit studies in un-anesthetized animals AI [14].
These subsets, especially the latter, have been scarcely uncounted under anesthetized conditions [28,49,50]. They were, however, excluded from the following analysis since the main interest here is to extract general principles governing stimulus-locking responses (see Background).
Response profiles for 4 representative neurons (A, B: synchronization type; C, D: non-synchronization type) in the click-train paradigm Figure 1 Response profiles for 4 representative neurons (A, B: synchronization type; C, D: non-synchronization type) in the click-train paradigm. (1st column) Raster display of spike occurrence in response to 0.5-s-long click trains (horizontal bar) at variable inter-stimulus intervals (ISI; ordinate, left) or repetition rate (ordinate, right). Regions α-δ represent qualitatively distinct response patterns as defined below. (2nd column) The "temporal modulation transfer function (tMTF)," defined as Rayleigh values as a function of ISI. The vertical bar denotes the ISI range of "region β" where statistically-significant degree of stimulus-locking responses took place (P < 0.05, Rayleigh statistics). The ISI that gives the maximal Rayleigh value is denoted as the "best ISI" (arrow). In majority of the synchronization neurons, the longer-ISI limit of region β (upper margin) was bordered with "region α" (A-1) where only the onset response was evident. (3rd column) The "rate modulation transfer function (rMTF)," defined as the mean driven rate (over 50-500 ms after the initiation of the train) as a function of ISI. The vertical bar denotes the ISI range where the mean driven rate exceeded the threshold for excitation (2*SD of spontaneous firing rate; dotted line). Among the ISI range, the one where no stimulus-locking responses took place was denominated as "region γ." In majority of the neurons examined, the shorter-ISI limit (lower margin) of region γ was bordered with "region δ" where only the onset response was evident. See text for details.

Effects of the Stimulus Level
To examine effects of the stimulus level, we adhered to the click-train paradigm at various stimulus levels (in pe-SPL; see Methods). We examined 34 synchronization neurons, firing activities of which could be isolated long enough for the detailed analysis. Among them, 29 neurons exhibited responses at 20 dB below the best SPL while 20 neurons did at 40 dB below the best SPL. Figure 3A demonstrates responses of a representative neuron (best SPL: 65 dB). Regions α-δ were clearly identified at any stimulus level as long as statistically significant responses were elicited ( Fig. 3A-1 to 3A-3; in the same format as Fig. 1A-1). However, the stimulus level, more or less, influenced the ISI values that divide the response regions. First, at 20 dB below the best SPL, the α-β border (Fig. 3B, open triangle, middle) and β-γ border (filled triangle, middle) were roughly the same as those measured at the best SPL (corresponding symbols, right); whereas the γ-δ border got slightly longer (cross, middle). Second, at 40 dB below the best SPL, the α-β border (open triangle, left) got slightly shorter whereas the β-γ border (filled triangle, left) as well as γ-δ border (cross, left) got longer.
These observations are confirmed by the population data ( Fig. 3C; the values were normalized to those measured at the best SPL).

Accumulation Effects: Region β
It has been widely reported that repetitive stimulation exerts accumulation effects on auditory neurons especially those in the auditory cortex (reviewed by [16,17]). We addressed whether and how such effects influenced the observed responses. In the present and the following section, we paid special attention to regions β and δ, where the causative relationship between a given stimulus and spikes can be clearly identified. Figure 4A displays the number of evoked spikes (#spikes; bin width = 5 ms) of a representative synchronization neuron (identical neuron as in Fig. 1A) at the best ISI (92 ms; for simplicity, responses at this relative ISI value is uniformly adopted in the following analysis of region β).
#Spikes in each discharge cluster progressively decreased (Fig. 4B) indicating that the impact of successive clicks cumulatively reduced the responsivity to the following signals. This finding is confirmed by the population data ( Fig. 4C) where #spikes elicited by each click was normalized to that elicited by the 1st click (hereafter, control level).
To address the durability of this response degeneration, we presented a single click (= probe stimulus) at 1.0, 1.8 or 3.6 s after the termination of the click trains that were delivered at best ISI (Fig. 4D). Figure 4E illustrates the normalized #spikes (see Methods) at each time point (Roman numerals correspond to the probe stimuli depicted in Fig. 4D). The value was still smaller than unity (broken line) at 1.0 s but became equivalent to unity at 1.8 and 3.6 s (P < 0.05; Fisher's PLSD test following ANOVA). This indicates that the response degeneration lasted 1.0-1.8 s after the termination of the trains.
The above features, both during and after the click trains, correspond to the phenomenon, so-called "frequencydependent depression" [18].

Accumulation Effects: Region δ
At relatively long ISIs in region δ, spikes occasionally occurred after the onset responses ( Fig. 1A-1). Such activities potentially hinder temporal precision in measuring the duration of the onset responses. Those activities were sufficiently suppressed at ISIs at or shorter than 0.9 multiples of the γ-δ border in all synchronization type neurons examined. This relative ISI value is uniformly adopted in the following analysis of region δ. at 15.6-ms ISI (region δ) which corresponds to 0.9 multi-  The tMTF (left column) and rMTF (right column) for individual synchronization neurons with band-pass tMTF (A) or low-pass tMTF (B), and for non-synchronization neurons with low-pass rMTF (C) or high-pass rMTF (D) ples of the γ-δ border. The onset response was 30 ms in duration (arrows, horizontal) into which 2 clicks fell (bold arrows, vertical) indicating that discharge clusters elicited by the 1st and 2nd click merged together. The onset response included 31 spikes (Fig. 5B, closed circle). This number was much larger than the total amount of #spikes which were elicited by the 1st and 2nd click in the trains delivered at the best ISI (square) (broken circles represent the same data as in Fig. 4B). The population data provides the similar findings: (1) the onset response in region δ was on average 36.8 ± 18.1 ms in duration ( Accumulation effect of responses in region β (at the best ISI) (A-C) and its durability (D, E). A, B: Peri-stimulus time histogram (bin width = 5 ms) (A) and sequential plot of the number of spikes (# spikes) involved in each discharge cluster (every discharge cluster was elicited by single clicks) (B). C: Population data (mean ± SD) in the same format as B except that #spikes was normalized to that involved in the 1st discharge cluster (= control level; same as followings). D: The procedure for measuring the responsivity after the trains that were delivered at the best ISI: a single click (= probe stimulus; arrows) was presented at either 1.0 s (I), 1.8 s (II) or 3.6 s (III) after the termination of the train. E: Population data of the normalized #spikes measured according to D. *P < 0.05 (paired t-test). Note (1) responsivity gradually declined with the progression of the train (C), and (2) the response degeneration lasted for 1.0-1.8 s after the termination of the train (E).
Region β N β response in region δ; see above]) presented at the best ISI (square; broken circles denote the same data as in Fig. 4C) (P < 0.01, paired t-test). These findings suggest that when several consecutive clicks fall within a few tenths of millisecond, there is a synergy of impact. This phenomenon seemingly corresponds to "paired-pulse facilitation" [19].
The onset response in region δ was typically followed by an unresponsive period (Fig. 5A). To estimate the recovery time course from this response degeneration, we presented a probe stimulus at 1.0, 1.8 or 3.6 s after the termination of the train (Fig. 5E). The normalized #spikes at each time point was illustrated as filled bars in Figure 5F (same format as Fig. 4E). For comparison, the data in Figure 4E (broken bars) was appended, which depicts the recovery time course after region β. The major findings are: (1) at 1.0 s, the value was smaller after region δ than after region β (P < 0.05, t-test); (2) at 1.8 s, the value after region δ was still below unity (P < 0.05) while the value after region β got equivalent to unity. Together, it appears that response degeneration is more profound and longer lasting after region δ than after region β. This fits well to the principle of "frequency-dependent depression": response degeneration grows larger and longer at higher stimulation rates [18]. It is, thus, plausible that the unresponsive period in region δ was caused by such intense "frequency-dependent depression" as to abolish firing activities for a while.

Involvement of Post-activation Suppression in Stimuluslocking Responses
The neural processes of stimulus-locking responses (e.g., Fig. 4A) can be glimpsed if we pay special attention to the vector strength, the origin of the tMTF (see Methods). The vector strength measures the degree of temporal confinement of spikes against stimuli. It reaches a maximum (= 1.0) when spikes occur in exactly the same period with reference to the individual stimuli, and spikes (regardless of whether evoked or spontaneous) are completely absent in the remaining period. On the other hand, it reaches a minimum (= 0.0) when spikes occur entirely independently of the stimuli. It is, thus, quite conceivable that the capacity for stimulus-locking responses arise from neural processes that temporally confine spikes. To examine this, we conducted the single-click paradigm in which the dynamics of neural activity after single click presentation was qualified with referring to the spontaneous firing rate (excitation, suppression or spontaneous-level activities; see Methods). This analysis was performed on 35 neurons (non-synchronization type, n = 8; synchronization type, n = 27) that showed an appreciable amount of spontaneous firing rate for detecting suppression.  Figure 6A, all non-synchronization neurons examined (low-pass rMTF, n = 4; high-pass rMTF, n = 4) showed only single excitatory period (1st excitation). In contrast, as exemplified in Figure 6B-D, all synchronization neurons showed a "postactivation suppression": the 1st excitation was followed by the suppression(s), suggesting the critical role of postactivation suppression in stimulus-locking responses.
We sorted the synchronization neurons into 3 subsets based on the sequence of excitation and suppression ( Table 2, rightmost column). The smallest subset (n = 5) showed "E-S sequence" (eg., Fig. 6B-2): only the 1st excitation and 1st suppression were evident. The largest subset (n = 13) showed "E-S-E sequence" (eg., Fig. 6C-2): the 1st suppression was followed by a rebound excitation (2nd excitation). This sequence has been often observed in AI [20,21] and somatosensory cortex [22]. The remainder (n = 9) showed "E-S-E-S sequence" (eg., Fig. 6D-2): the 2nd excitation was followed by another suppressory period (2nd suppression). This subset includes 3 neurons in which the 1st and 2nd suppression were separated by spontaneous-level activities instead of the 2nd excitation. Among the synchronization neurons examined, the 2nd suppression was present in 9 out of 14 neurons with band-pass tMTF, whereas it was absent in all the neurons with low-pass tMTF ( Table 2). Fisher's exact test revealed a significant effect of the 2nd suppression on the tMTF shape (P < 0.001). It is, thus, plausible that the interplay of the 2nd excitation and 2nd suppression constrains tMTF shape (band-pass vs. low-pass).

Modeling of AI Temporal Behavior
de Ribaupierre and colleagues [23] reported in AI single unit study that stimulus-locking responses were greatly related to the temporal interplay of depolarization and hyperpolization. Compelling evidence indicates that depolarization and hyperpolization of AI neurons are chiefly based on excitatory post-synaptic potentials (EPSPs) and inhibitory post-synaptic potentials (IPSPs), respectively [24][25][26][27]. Notably, Cox and colleagues [27] demonstrated in AI slice preparations that electrical stimulation to thalamocortical afferent fibers (Fig. 7A, arrow) elicits at maximum 4 PSP components at the soma of a subset of AI neurons in the following order: a fast-EPSP, fast-IPSP, slow-EPSP and slow-IPSP (broken curves). This scheme leads to the following prediction how the AI neurons generate firing responses to paired stimuli of variable ISI. First, when the 2nd stimulus ( Fig. 7B-1, open arrow; in the same time scale as in Fig. 7A) is given during the fast-EPSP, it may readily elicit firing responses. However, the gap between the discharge clusters elicited by the 1st and 2nd stimulus is seemingly ambiguous.
Consequently, stimulus-locking responses, if any, would be weak (Fig. 7C, broken curve, region δ). Second, when the 2nd stimulus is given during the fast-IPSP ( Fig. 7B-2) Accumulation effect of responses in region δ (at 0.9 multiples of the γ-δ border) (A-D) and its durability (E, F)

Figure 5
Accumulation effect of responses in region δ (at 0.9 multiples of the γ-δ border) (A-D) and its durability (E, F).
The format in A, B, D, E, F is the same as in Figure 4A-E, respectively. A: Peri-stimulus time histogram of the same neuron as in Figure 4A. The initial two clicks (arrows, bold) presumably contributed to the onset response (see text). B: #Spikes involved in the onset response (filled circle). For comparison, the data in Figure 4B was appended (in region β; broken circles) with the square representing #spikes elicited by the initial two clicks. C: Population data (mean ± SD) of joint distribution for the duration of the onset response (abscissa) and the number of clicks involved in it (#clicks, ordinate; mean = 2.4). The double circle represents the data in A. D: Population data of the normalized #spikes in region δ (filled circle). For comparison, the data in Figure 4C was appended (in region β; broken circles) with the square representing #spikes elicited by the initial three (> 2.4) clicks. E: Procedures for measuring the responsivity after the train that elicited region δ responses. F: Population data of the normalized #spikes measured according to E (filled bar). For comparison, the data in Figure 4E was appended (after region β; broken bar). *P < 0.05, **P < 0.01 (paired t-test). Fig. 7B-3), it may hardly elicit firing responses. Stimulus-locking responses would be negligibly small (Fig. 7C, broken curve, region γ and region α).

Region
Third, when the 2nd stimulus is given during the slow-EPSP (Fig. 7B-4), it may readily elicit firing responses. Those discharge clusters are expectedly separated by the fast-IPSP (ref. , Fig. 7A). As a consequence, clear stimuluslocking responses would take place (Fig. 7C, broken curve, region β). Interestingly, the "predicted tMTF for paired stimuli"(broken curve) resembles the "observed tMTF on click trains" that was schematically drawn based on the data of the synchronization type with band-pass tMTF (solid curve; based on Table 1) except for the short ISI potion (shaded zone). This inconsistency may arise from the fact that if the stimuli are repetitively delivered at short ISI, impact of initial several stimuli leads to such intense "frequency-dependent depression" as to abolish firing responses to the following stimuli (Fig. 5A). Collectively it is suggested that (1) as a principle, the temporal interplay of the PSP components underlies AI stimuluslocking responses and (2) at short ISI, intense frequencydependent depression abolishes stimulus-locking responses.
To numerically examine the above conjecture, we conducted a simulation study (see Methods for details). The model consists of a single AI neuron which receives external input via various combinations of AMPA, GABA A , NMDA and/or GABA B receptors, each of which was reported to chiefly mediate the fast-EPSP, fast-IPSP, slow-EPSP and slow-IPSP, respectively (Fig. 7A, dotted curves [27]). The amplitude of external input was modeled in two different ways: (1) taking fixed value regardless of ISI or (2) varying in an ISI dependent manner due to both frequency-dependent depression and paired-pulse facilitation. Each panel in Figure 8A shows a version of simulated membrane potentials responding at the best ISI (= 71 ms). If AMPA receptors alone were incorporated into the model (Fig. 7A-1), then each input signal elicited firing responses. However, robust onset responses, which were always observed in region β (Fig. 4A), was unclear. If NMDA receptors were added into the model (Fig. 8A-2), then firing responses occurred throughout the stimulus train; however, stimulus locking became unclear. If GABA A receptors were added to the model ( Fig. 7A-3), then the neuron gained the capacity for stimulus locking. While this feature, more or less, resembled our physiolog-Dynamics of firing responses of 4 representative neurons (A: non-synchronization type; B-D: synchronization type) in the sin-gle-click paradigm Figure 6 Dynamics of firing responses of 4 representative neurons (A: non-synchronization type; B-D: synchronization type) in the single-click paradigm. A single click (arrow) was presented, and the spike occurrence was examined with raster dot plots (-1) and post-latency time histogram (bin width = 10 ms; -2). The threshold for excitation and suppression was set at plus/minus 2*SD of the mean of spontaneous firing rate (dotted line), respectively. The qualitatively distinct responses were aligned in the following sequence: only excitation (A), excitation (E) followed by suppression (S) (hereafter, "E-S sequence"; B), "E-S-E sequence" (C) and "E-S-E-S sequence" (D).
ical observation (Fig. 4A), the former differed from the latter in that: (1) the 3rd stimulus failed to elicit firing responses and (2) the discharge clusters elicited by the 4th and 5th stimulus merged together. These discrepancies were diminished if GABA B receptors were incorporated into the model (Fig. 8A-4; "4-receptor version").
Nonetheless, the 4-receptor version does not necessarily parallel our physiological observation. First, this model predicts the occurrence of stimulus locking at much shorter ISI (e.g., 14.3 ms ISI; Fig. 8B-1) compared to physiological observation. In other words, the predicted β-γ border (4.5 ms; Fig. 8F, open arrow) was much shorter than the observed one (~30 ms, Table 1). Second, this model predicts "skipping" of firing responses at short ISI: for example, the spikes were expected to occur every other stimulus at 11.8 ms ISI (Fig. 8C-1) and every three stimuli at 5.6 ms ISI (Fig. 8D-1). These features, however, have been scarcely encountered in our physiological recording (but [22] in somatosensory cortex of anesthetized rats). In fact, the shortening of ISI led to the systematic reduction of responsivity to the latter clicks in a given train (e.g., Fig.  4A) while responsivity to the initial several clicks being relatively well preserved, as reported by plenty of single unit studies using periodic signals [14,20,23,28,29,38]. These two discrepancies were diminished when we provided the external input with both frequency-dependent depression (FDD) and paired-pulse facilitation (PPF) ("full version"). Specifically, the capacity for stimulus locking at shorter ISI was considerably weakened (cf., Fig.  8B-2 to 8B-1, 8C-2 to 8C-1, and 8D-2 to 8D-1) with only the onset response being manifested at the γ-δ border (Fig. 8C-2) and in region δ (Fig. 8D-2). As a consequence, the β-γ border prolonged to 25 ms (Fig. 8F, filled arrow).
At longer ISI, in marked contrast to short ISI, the 4-receptor version and full version predicted similar response features. For instance, at 164 ms the initial two clicks elicited stimulus-locking responses while the following ones did not ( Fig. 8E-1,-2), similarly to region α (Fig. 1A-1). This indicates that the frequency-dependent depression and paired-pulse facilitation are much less influential at longer ISI.
Next, we investigated the main constraints on responses at long ISI. First examined was the contribution of GABA Breceptor-mediated IPSP. When the conductance of the GABA B receptors ( GABAB ) was reduced by factor of 0.1 ( Fig. 8E-3; conductance of the other receptors was kept constant, same as followings), responsivity to the initial several stimuli was enhanced (cf., Fig. 8E-2) thereby prolonging the α-β border (Fig. 8F, inset, dotted arrow). Contrariwise, when the conductance of the NMDA receptors ( NMDA ) was reduced, responsivity to the 2nd and latter stimuli decreased (Fig. 8E-4) enhancing low-cut effect. Manipulation of the conductance of the AMPA or GABA A receptors did not greatly influence responses at long ISI (data not shown). Taken together, it is suggested that the GABAB and NMDA act as main constraints on the temporal filtering at long ISI: if the former is reduced, the tMTF tends to be more low-pass whereas if the latter is reduced, the tMTF more band-pass.

Discussion
By using click signals, we investigated neural mechanisms underlying ISI-dependent responses of the AI neurons which had the capacity for stimulus-locking responses (synchronization type; Fig. 1A and Fig. 1B). The β-γ border, i.e., the shorter cutoff ISI for stimulus-locking responses, lay at on average ~30 ms (Table 1) and was level tolerant over high SPLs (Fig. 3C). The time course of excitability during (Figs. 4C and 5D) and after (Fig. 5F) the click trains suggested the involvement of short-term plasticity of thalamocortical synaptic connections. Comparison between response features to the click trains and a single click (Table 2) led to the notion that the temporal interplay of excitation and suppression basically determines the capacity for stimulus-locking responses as well as tMTF shape. A single-cell dynamic model well replicated the physiological data (Figs. 8F and 8G) suggesting g g g g that ISI-dependent responses of the synchronization neurons are configured through the temporal interplay of the post-synaptic potentials (Fig. 7A) along with short-term plasticity of thalamocortical synaptic connections.

Perceptual Relevance of the Observed AI Responses
Case studies about AI-impaired patients indicated that AI is responsible for the temporal-order threshold (see Background). The mean value of the β-γ border (~30 ms; Table   1), i.e., shorter cutoff ISI for stimulus-locking responses (e.g., Figs. 1A-1 and 1A-2), agree well with the temporalorder threshold [4]. The β-γ border was nearly invariant at the best SPL and 20 dB below it (Fig. 3C, dotted line and filled triangle in the right, respectively) in accordance with the level tolerance of the temporal-order threshold over high SPL [30]. These findings strongly support the notion that the β-γ border serves as a neural correlate of the temporal-order threshold. Accordingly, it could be postulated that click trains are clearly heard as a series of "discrete" events as long as stimulus-locking responses dominate in AI. By taking count of the view that the temporal-order threshold and perceptual boundary of CV syllables at least partially share common neural processes (see Background), it is possible that the β-γ border also serves as the basis for the perceptual boundary of CV syllables.
The psychological studies have revealed that a total body of click trains of ISI ≥ ~30 ms produces two kinds of sensation: at ISI of ~30-200 ms, it leads to "rhythm" percept; at ISI ≥ ~200 ms, rhythm percept fades away while the sensation of "fluctuation" remains [1]. Since the value of the α-β border (~175 ms; Table 1) as well as its level tolerance ( Fig. 3C, open triangles) is consistent with the rhythmfluctuation boundary (~200 ms), it is possible that region α dominantly represents "fluctuation" whereas region β does "rhythm" percept.
At ISI < ~30 ms where individual clicks are no longer clearly heard as discrete events, a total body of click trains leads to three kinds of sensation with partial overlap [31][32][33]. At ISI of ~3-30 ms, a buzz or rattle like sensation is produced, defined as "roughness"; at ISI of ~5-15 ms, tone quality of sensation dominates, whose perceived frequency is directly related to waveform periodicity (defined as "periodicity pitch"); at ISI < several ms, another mode of pitch sensation dominates, which depends on the fundamental frequency ("spectral pitch"). None of the ISI range of these sensations does not fit to that of region γ or region δ (Table 1) making it unlikely that our single unit data have direct relevance to these sensations. Steinschneider and colleagues [34] suggested that periodicity pitch may be represented in AI by oscillatory neuronal ensemble responses locking to temporal enve- [27]). Temporal interaction between the PSPs may form four phases (solid curves) in the following order: fast excitatory (FE), fast inhibitory (FI), slow excitatory (SI), and slow inhibitory (SI) periods. Numerals represent latency-to-peak (mean ± SD). Identical time scale is applied to A-C. B: Predicted firing responses (vertical lines) elicited by paired stimuli of various ISI. When the 2nd stimulus is given during period FE (ref., A; same as followings) (-1), it would readily elicit firing responses. However, the discharge clusters elicited by the 1st and 2nd stimulus may merge together so that stimulus-locking responses, if any, would be weak. When the 2nd stimulus is given during period FI (-2) or SI (-3), it would hardly elicit firing responses and, consequently, stimulus-locking responses. When the 2nd stimulus is given during period FE (-4), it would readily elicit firing responses. The discharge clusters elicited by the 1st and 2nd stimulus may be clearly separated each other by period FI leading to intense stimulus-locking responses. C:

Figure 7 A: Scheme for a sequence of EPSPs and IPSPs (broken curves) in the AI neuron elicited by a single afferent stimulation (based on
The tMTF that derived from the prediction in B (broken curve) and the tMTF that was schematically drawn from the data of the synchronization type with band-pass tMTF (solid curve; based on Table 1). Regions α-δ correspond to those in Figure 1A

Observed on click trains
Predicted from paired stimuli δ lope, and spectral pitch by rate-place coding that is sensitive to both the fundamental frequency and other harmonics in the train. Further insight into neurophysiological bases of these sensations would be obtained by coordinated single/multiunit recordings and psychoacoustic experiments.

Comparison to Previous AI Studies in Un-anesthetized Animals regarding to Cell-type Classification
An accumulating body of single unit studies in un-anesthetized animals has investigated AI responses to periodic acoustic signals such as click trains [14,23] and amplitude-modulated sounds [35][36][37][38]. Irrespective of methological differences (e.g., stimulus configuration, electrode properties and statistics), these studies appear to agree that AI neurons comprise largely two subsets: one subset responds predominantly at long ISI (≥ ~30 ms, occasionally extending to 10 ms or less) in a stimulus-locking manner (i.e., temporal code) while another subset does at short ISI (< ~10 ms) in a sustained manner (i.e., rate code).
The following three findings indicate that our synchronization neurons (eg., Fig. 1A) correspond to the subset that was reported to conduct temporal code. First, the shorter cutoff ISI for stimulus-locking responses in those studies (30-40 ms) was similar to the β-γ border of our synchronization neurons (~30 ms; Table 1). Second, at shorter ISI (10-30 ms), the studies reported that the neurons intermittently fired without stimulus locking. This feature resembles region γ responses of our synchronization neurons. Third, at much shorter ISI (≤ 10 ms), the studies reported that only the onset response was evident. This feature, as well as its cutoff ISI, is quite akin to region δ of our synchronization neurons.
Our non-synchronization neurons with high-pass rMTF showed non-stimulus-locking responses during the presence of the click trains (eg., Fig. 1D). Such responses occurred only at short ISI (< ~10 ms) with shorter ISI leading to larger driven rate (D-3). This feature closely resembles the responses of the subset that was reported to conduct rate code. On the other hand, our non-synchronization neurons with low-pass rMTF (eg., Fig. 1C) do not correspond to either subset mentioned above. They may belong to the "unclassified neurons" in Wang and colleagues' study [14], which were reported to respond in some manner to click signals without clearly defined stimulus-locking responses or non-stimulus-locking rate responses.

Comparison to Previous AI Studies of Neural Mechanisms underlying Stimulus-locking Responses
de Ribaupierre and colleagues [23] revealed that AI stimulus-locking responses are related to the temporal interplay of depolarization and hyperpolization. While this interplay potentially results from non-synaptically mediated after-hyperpolarization [39], growing evidence indicates that this interplay is based mainly on the sequence of EPSPs and IPSPs [25,26]. In particular, Cox and colleagues [27] proved in rat AI slice preparations that EPSPs and IPSPs are mediated chiefly through AMPA/NMDA receptors and GABA A /GABA B receptors, respectively (Fig.  7A). To date, however, it has been unclear whether and how these PSP components are related to stimulus-locking responses of AI neurons.
On the other hand, there is physiological data suggesting that AI stimulus-locking responses are mediated through other neural processes than the interplay of PSP components. For example Wehr and Zador [26], by employing the whole-cell recording technique on ketamine-anesthetized rats AI, measured the excitatory and inhibitory synaptic conductances elicited by click pairs of variable ISI. They found that inhibitory conductances were too shortlived to account for suppression of spiking responses to the 2nd click which was delivered at ISIs ≥ several hundreds milliseconds. Eggermont [40], based on physiological data, proposed a model in which presynaptic facilitation and depression determine the low-pass characteristics of AI stimulus-locking responses.
Our single-cell dynamic model comprehensively integrates the above findings/suggestions in that the capacity for stimulus-locking responses (i.e., region β) is explained in terms of the temporal interplay of the PSP components along with short-term plasticity (Fig. 8A-5). This view is compatible with the pioneering work of Grothe [41] that proved the critical contribution of EPSPs and IPSPs to stimulus-locking responses of auditory brainstem neurons for encoding interaural time difference. Furthermore, our model can explain the neural processes that give rise to the other ISI-dependent AI responses such as in regions γ, δ and α (Figs. 8B-2, D-2, and 8E-2).
It has been proved, in a gap-in-noise detection paradigm where leading and trailing wideband noise had the same frequency content, that the minimum gap between the noises (i.e., ISI) for stimulus-locking responses was 30 ms for a 20-ms leading noise, 10 ms for a 50-ms leading noise, and reached an asymptote of 5 ms for a 200-ms leading noise [15,42]. This implies that temporal resolving capacity of individual AI neurons is not fixed, but varies dynamically as depending on the duration of the leading signal. Since our model was based on the physiological data obtained using only 1-ms long clicks, this is best suited for neural processes that are triggered at the stimulus onset, not later in the stimulus.
The present study revealed that recovery period of AI spiking activities was in the order of seconds (Fig. 5F). This range is consistent with the values reported in awake (> 1s [43]) and ketamine-anesthetized animals (> 500 ms [26]) but is much longer than those measured in barbiturateanesthetized animals (20-200 ms [44][45][46]). A number of observations indicate that barbiturate reduces spontaneous and evoked spiking activities [28,47,48]. This leads to a conjecture that frequency-dependent depression, which results chiefly from temporal exhaustion of readily releasable neurotransmitter pool [18], is much less potent and less durable under barbiturate anesthesia than the other conditions. Under barbiturate anesthesia, weight of the influence on the recovery period may shift from frequency-dependent depression, which lasts for seconds [17,18], to IPSPs which extend for maximally several hundreds milliseconds.

Relative Contribution to AI Stimulus-locking Responses: Intra-AI Processing vs. Sub-AI Processing
There is a marked resemblance between the predicted tMTF (Fig. 7C, broken curve) and observed tMTF (solid curve) except for the short ISI portion (shaded area; for reason, see above). Remarkably, these tMTF curves derived from rather different experimental conditions: the former was based on the data obtained by electrical stimulation to rat AI slice preparations (Fig. 7A); the latter,  for stimulus-locking responses (i.e., β-γ border; Fig. 7C, filled triangle) between un-anesthetized (~30 ms; present study and [14]) and anesthetized animals (≥ 50 ms [28,49,50]), it seems likely that the neural processes of AI stimulus-locking responses are much less susceptible to the mechanical infringement of slicing the brain and species difference than the pharmacological effects of anesthetics. Some anesthetics, such as pentobarbital, are known to potentate GABA A -ergic inhibition [48]. This may enhance the fast-IPSP and consequently diminish the slow-EPSP, especially in its early phase (ref. , Fig. 7A). As a result, the transition point from the fast-IPSP to slow-EPSP will shift afterward (horizontal arrow), prolonging β-γ border (Fig. 7C, horizontal arrow). Slice preparations, on the other hand, do not suffer such artificial enhancement of GABA A -ergic inhibition but retain local circuitry. These considerations favor the idea that AI stimulus-locking responses are elaborated mainly through intra-AI processing rather than simple preservation of sub-AI processing. The above idea receives support from the previous studies that directly compared the best ISI within pairs of functionally connected medial geniculate body (MGB) neuron and AI neuron [37,51]. In those studies, spiking activities of individual MGB and AI neurons were simultaneously recorded and the functional connection was confirmed if their activities showed a single cross-correlogram peak within 1-5 ms lag time, the MGB neuron leading the AI neuron, under both spontaneous and stimulus-driven conditions. Importantly, no rank correlation was revealed for the best ISI; MGB neurons with longer (shorter) best ISI did not preferentially connect with AI neurons with longer (shorter) best ISI. This suggests that the generally observed prolongation in the best ISI from MGB to AI [52,53] cannot be simply attributed to a degradation of temporal resolution due to intrinsic membrane properties or synaptic delay but is rather due to more elaborated intra-AI processing.

Conclusion
Our physiological observation suggest that β-γ border, the shorter cutoff ISI for stimulus-locking responses of AI neurons, serves as a neural correlate of the temporal-order threshold and VOT boundary of CV syllables. The present modeling study supports the idea that the observed ISIdependent responses are largely mediated through temporal interplay between EPSPs and IPSPs at the thalamocortical synapse along with its short-term plasticity.
The parameter values in our model were not directly measured in the current recording study but were determined referring to other published data (see Methods). Under this proviso, the weight of AMPA, GABA A , NMDA and GABA B receptors was arranged to better predict the observed phenomena. In fact, relative weight of the receptors may vary across neurons, and other cellular and/or network mechanisms may also contribute to the observed phenomena. Further insight into this issue would be glimpsed by measuring the membrane potential of the AI neurons during acoustic stimulation or analyzing the effect of selective receptor antagonists on their stimulusresponse features.

Methods
Experiments were performed in a manner consistent with the Guidelines for Animal Experiments, University of Yamanashi, and the Guiding Principles for the Care and Use of Animals approved by the Council of the Physiological Society of Japan. Animal preparation, recording, and histology procedures were the same as in our previous report [54][55][56][57]. Briefly, the cats underwent surgery under pentobarbital sodium anesthesia. Aluminum cylinders and metal blocks were implanted for subsequent extracellular recording and restraining of the animal's head, respectively. At least two weeks were allowed for recovery before the recording. During the recording, the animal was placed in an electrically shielded and sound-attenuated room with its body wrapped in a cloth bag and its head restrained with a holding bar. The animals were kept awake throughout the recording period and were monitored with an online surveillance camera and electroencephalography. When drowsiness was suspected, the cat was awakened by gently tapping its body using a remotecontrolled tapping tool or by briefly opening the door of the room. A glass microelectrode (tip diameter, 1. 8-2.5 μm; resistance, 2-3 MΩ; filled with 2 M NaCl) was inserted into AI. Tone bursts of variable frequencies and sound pressure levels (SPLs) were presented as search signals. Single unit activities were recorded and their occurrences were identified using a window discriminator. The spike-occurrence outputs were captured on a Pentiumbased computer with a time resolution of 2 μs as the digital input for data analysis. The animals sometimes voluntarily moved during recording sessions, creating artifacts in the recording. By carefully checking the online monitor screens of the animal and firing activities, these motion artifacts were marked in real time on the recording-computer while recordings were in progress. Data with artifacts were rejected. Daily recording sessions lasted < 6 hours, and the total duration of the experiment continued for 2-6 months per animal. At the termination of the experiment, some recording sites were marked with electrolytic lesions. The animals were then sacrificed with an overdose of pentobarbital sodium and perfused with 10% formalin. The brain was cut in transverse sections and stained with neutral red. The recording sites were reconstructed based on the electrolytic lesions and electrode tracks.

Sound generation and delivery
The sound signals were generated using user-written programs in MATLAB (Mathworks, Natick, MA) on a Pentium-based computer. The signals were fed into a 12-bit digital-to-analogue converter (BNC2090; National Instruments, Austin, TX) at a sampling interval of 100 kHz and to an eight-pole Chebyshev filter (P-86; NF Electric Instruments, Yokohama, Japan) with a high cutoff frequency of 20 kHz. The output was attenuated and sent to a low-output-impedance power amplifier (PMA2000III; Denon Electronic GmbH, Ratingen, Germany), and then the sound signals were presented from a speaker (K1000; AKG Acoustics, Wien, Austria) placed 2 cm away from the auricle contralateral to the recording site. We equalized and calibrated the sound delivery system between 128 and 16,000 Hz in 8 Hz steps, and the output varied by +/ -1.5 dB. One set of stimuli was presented at variable intervals ranging 4.0-5.5 s. We employed (1) click-train paradigm and (2) single-click paradigm (see below).

Click-train paradigm
Once single unit activities were isolated, we conducted a click-train paradigm: rectangular clicks (1-ms duration) were delivered in a train fashion (0.5-s duration; Fig. 1A-1, horizontal line). Since the neurons examined in the present study are included in our previous study for periodicity coding [57], click repetition rate was systematically varied at 0. . This kind of compensation has been often adopted in the previous AI studies [14,28,58,59].
Two measures were used to characterize response features at each SPL. The first was related to the degree of stimulus locking. At the beginning, the stimulus-locking discharges were quantified with vector strength [60]. The vector strength is calculated by the following equation: where x i = cos θ i and y i = sin θ i , n: total number of spikes, and each spike is treated as a unit vector with a given phase 0-2 π assigned to the ISI of interest. The vector strength ranges between 0.0-1.0. A value of 0.0 indicates that spike occurrence is entirely independent of the signal periodicity, whereas a value of 1.0 indicates that all spikes occur at exactly the same phase as the signal. The significance of the vector strength was assessed using the Rayleigh Z-test [61] at the 5% significance level (Fig. 1A-2, dotted line). At a given SPL, Z value was filtered by a weighted average with its 5 neighbors in the ratio of 1:2:3:2:1 and was plotted against ISI (hereafter, "temporal modulation transfer function [tMTF]"). By comparing the tMTF obtained at different SPLs, we defined the value of the SPL and ISI to produce the maximum Z value as the "best SPL" and "best ISI" (arrow), respectively. Note, the following data were obtained at the best SPL, unless otherwise specified. If the maximum Z value of a given neuron exceed the significance level, we evaluated stimuluslocking responses taking place (solid bar) and classified the neuron as "synchronization type"; if not, classified as "non-synchronization type." The second measure was based on the rate of firing activities. At the beginning, the driven rate at each ISI was calculated by subtracting the mean of spontaneous firing rate from firing rate for 50-500 ms after the onset of the click trains. Then, the driven rate was filtered by a weighted average with its 5 neighbors in the ratio of 1:2:3:2:1 and was plotted against ISI (hereafter, "rate modulation transfer function [rMTF]"; Fig. 1A-3). When this measure exceeded the threshold for excitation (= 2*SD of the mean of spontaneous firing rate; dotted line), we evaluated rate response taking place (solid bar).
To estimate the recovery time course of neural responsivity, we presented a single click (= probe stimulus) at 1.0, 1.8 or 3.6 s after the termination of the click trains and measured the number of evoked spikes (#spikes) (Fig.  4D). For simplicity, we set ISI of the click trains at either best ISI or 0.9 multiples of the γ-δ border (see Results).

Single-click paradigm
To assess the involvement of suppressory processes in the stimulus-locking responses, we conducted a single-click paradigm: a single rectangular click (1-ms duration, at the best SPL) was presented 10-20 times at intervals of 4.0-5.5 s. The spike occurrence was examined with dot raster plots ( Fig. 6A-1) and post-latency time histogram (Fig.  6B-2). For constructing the latter, we first obtained peristimulus time histogram (bin width = 2 ms). We defined the response latency as the beginning of three consecutive bins in which the firing rate exceeded the threshold for excitation (see above). Then, we obtained a post-latency time histogram for 500 ms after the latency (bin width = 10 ms). The threshold for "suppression" was set at the mean minus 2*SD of the spontaneous firing rate (dotted line, bottom). Note that this paradigm was executed only VS x n y n for part of the AI neurons (n = 35) that showed an appreciable amount of spontaneous firing rate to make the threshold for suppression > 0 (spikes/s).

Statistics
Physiological data was presented as the mean ± SD. If necessary, #spikes was normalized to the value elicited by the 1st click in the train that was delivered at the best ISI (= control level; e.g., Fig. 4C). In general, we employed Student's t-test or a one-way repeated Analysis of variance (ANOVA) followed by (post-hoc) Fisher protected leastsignificant difference test (PLSD) for pairwise comparisons. The significance level was set at P < 0.05 against a null hypothesis of equal performance.

Minimal cortical models
We adopted a single-cell dynamic model for describing the response kinetics. Simulation of the model was performed using XPPAUT, developed by G.B. Ermentrout and available at http://www.math.pitt.edu/~bard/xpp/ xpp.html. Although the model is minimal, it produces a good approximation of the spike shapes and temporal response properties experimentally observed in the present study. The principal equation describing the change in the membrane potential V m (mV) of a neuron at the soma is given by the following current balance equation: C m dV m /dt = I ion + I syn + I app + noise (1) where C m is the membrane capacitance (1 μF/cm 2 ). The right-hand side incorporates the intrinsic ionic currents (I ion ), synaptic currents (I syn ) and external input (I app ). In addition, the model includes noise current that comes from a presynaptic neuron (noise; λ = 500 Hz). The presynaptic neuron fires randomly with a uniform distribution in time. The usual method of integration was a fourthorder Runge-Kutta method with a time step of dt = 0.02.

Intrinsic Current
I ion (equation 1) represent the sum of intrinsic ionic currents which are contributed mainly by the voltagedependent sodium current (I Na ), potassium current (I K ) and leak current (I leak ).
Each current was modeled in terms of the Hodgkin and Huxley type first-order kinetic [62].
where the values of the parameters were derived from the experimental data and other models [63,64] (same as I k and I leak ) and are as follows: Na (maximal conductance of the sodium channel) = 120 mS/cm 2 , p = 3, q = 1, E Na = 50 mV.
The equation for the I Na activation variable is: where m ∞ is the equilibrium value and τ m is a time constant of m as a function of V m , with the forward and backward rate constants being given by: The equation for the I Na inactivation variable is: where h ∞ is the equilibrium value and τ h is a time constant of h as a function of V m , with the forward and backward rate constants being given by: where k = 36 mS/cm 2 , j = 4, E k = -77 mV.
The equation for the I K activation variable is:  (14) g where n ∞ is the equilibrium value and τ n is a time constant of n as a function of V m n ∞ = α n /(α n + β n ) (16) τ n = (α n + β n ) -1 (17) with the forward and backward rate constants being given by:   (28) where k sn = 1 ms -1 , τ NMDA = 120 ms, k xn = 1 ms -1 and τ NMDA = 14 ms.

External Input
The external input (I app ) was modeled as 1-ms-long repetitive square-wave pulses (initial amplitude = 12.5 μAcm -2 ) with or without frequency-dependent depression which results mainly from temporal exhaustion of readily releasable neurotransmitter pool at presynaptic terminals [18].
here P is the release probability and N is the number of readily releasable vesicles. We set k = 0.05 and the initial value of N = 5, then the above equation gives the initial release probability P 1 = 0.43. N is modeled in terms of the depletion vs. refilling dynamics of the vesicles: where λ d and λ r are the depletion and refilling time constant, respectively, while N c represents the maximum size of the readily releasable vesicles, and n d represents the number of stimuli required to deplete the vesicles. Parameter values are set such that λ r = 0.05, N c = 15 and n d = 14.
These values were chosen from the range given in previous reports [75,76] and were selected to fit to the present data that most probably reflects frequency-dependent depression (Fig. 4C, open circles). Since synaptic potency, which is defined as the average size of the synaptic response when transmitter release does occur, has been proven not to considerably vary despite the changes in release proba-bility [75,77], we considered that I app is linearly related to P(N).
We have often observed, as already reported by Dobrunz and Stevens (1997) [75], that experimentally measured responses to the 2nd click far exceeded the calculated value from equation (34) especially at very short ISIs (i.e., "paired-pulse facilitation") possibly due to a residual elevation in the presynaptic intracellular calcium concentration [19,78]. To compensate this, we substituted the following equation to describe the release probability of the responses to the 2nd signal in the train, P 2 : here, int = 1000/f r . The best fit of this kinetic scheme to the present data (Fig. 5D, filled circle) gave C 1 = 0.2,C 2 = 1.3 and τ 1 = 20 ms.