Naturalistic Behavioural Experiments
Stimuli
We used two different types of acoustic stimuli: spoken words ('Speech') [see Additional file 2] and non-linguistic artificially produced sounds ('Artificial') [see Additional file 3]. In Germany, the recorded spoken words were "Geh" (go!), "Komm" (come!), "Rad" (wheel), and "Kinn" (chin); in the UK we used "go!", "come!", "rat", "chin" and "Kinn" (from the German stimuli because this sounded similar to 'kin'). Half of the human utterances in each category were produced by two female speakers and the other half by two male speakers. The artificial sounds consisted of a total of 12 artificial stimuli generated from natural non-biological sounds which were distorted such that they could no longer be attributed to natural events. Stimuli were taken from a subset of sounds employed in a previous study [30]. These stimuli were produced by cutting and distorting portions of longer commercial sound files, with a duration of 200 ms including 10 ms rise and 40 ms fall time. Based on the results of a subjective rating study, only sounds which were classified as being unidentifiable by fifteen subjects were used in the present study (for further details, see [30]). All auditory stimuli were normalized for sound pressure level. The average stimulus duration was 330 ± 135 ms (mean ± SD).
Procedure
Participants for this study were chosen opportunistically, depending on their posture. Subjects had to stand still in a straight, upright posture at the time of the trial. In addition, subjects' heads were required to be straight with respect to the body on the horizontal axis (i.e. looking neither left nor right), while they could be looking up or down (i.e. bent on the vertical axis) as this should have no effect on preference to turn either right or left. Experiments in Germany were conducted in Göttingen; those in the UK in London and Cambridge.
Two experimenters participated in each trial. The first used a PalmOS handheld (Tungsten E2, Palm Inc., Wokingham Berkshire, UK) to record information about each trial. The data Pendragon Forms mobile software was used to create a list of information to be recorded in a database for each trial. A second experimenter (the initiator) approached each subject from behind. The initiator placed herself with portable speakers (Travelsound 400, Creative, Dublin, Ireland or SP-106 Travel Speakers, Tesco Technika, Hertfordshire, England) and an mp3 (Zen Nomad Jukebox, Creative, Dublin, Ireland) player directly behind the subject at approximately 1 m distance. The height of the presentation was approximately 0.2–0.5 m below the ear level of the listener, so that the presentation could be done without being too conspicuous. Then the recorded sound was played back at a peak sound pressure level of 76.7 ± 3.9 dB re 20 μPa measured at a distance of 1 m (SPL level meter Rion NL-05, fast mode). If the subject did not respond immediately, the sound was produced again, up to a maximum of three times after which the trial was aborted if the subject had not responded. 62% of the participants responded in the first trial, 30% and the second, and 8% of the subjects in the third trial. There was no difference in turning biases in relation to the number of presentations (χ2 = 0.83, N = 224, df = 2). Importantly, the initiator was instructed to playback sounds from a position directly behind the subject but was blind to the purpose of the study and also to the specific stimulus category played back in a given trial. Following each successful test, the subject was approached by both experimenters. The purpose of the experiment was explained and subject's handedness assessed. Further, the time spent living in the respective country was requested. People who had been living less than 10 years in the country were excluded from the analysis. The following information was recorded on the PalmOS handheld: the sound used or produced, the number of times played (up to a maximum of 3), the reaction orientation (right or left) for each subject, and the subject's gender and handedness. Since we obtained too few data for a meaningful analysis of left handed people, these data were discarded. The final sample consisted of 224 successful trials, each with a different individual. All people tested consented to the use of their data.
Statistical Analysis
We used Binomial tests to examine orienting biases within the stimulus categories separately for each population (country). Results were corrected for multiple testing using a sequential Bonferroni correction (Step-Up Hochberg). In addition, a multinomial regression was used to test for the effects of country, stimulus category, and gender as well as the interactions between the main factors. All tests were calculated using SPSS 15.0.
FMRI Experiment
Participants
22 right-handed, healthy young volunteers (10 female, aged from 22 to 34 years, mean age = 25.95 years) participated in the study. Handedness was assessed by the Edinburgh Handedness Inventory (Oldfield, 1971). Only subjects with normal hearing acuity were included. After being informed about potential risks and screened by a physician of the institution, subjects gave informed consent before participating. The experimental standards were approved by the local ethics committee of the University of Leipzig. Data were handled anonymously.
Stimuli
In addition to the stimuli described above (Speech and Artificial sounds), we also used time-reversed spoken words (Reverse) and non-linguistic human utterances as part of a separate investigation. In the Reverse class the same stimuli as in the Speech class were used but played-back in reverse. To further test whether hemispheric lateralization concerns species-specificity or acoustic features, we included a Human Sound class that comprised coughing, harrumphing, humming, and clicking one's tongue, i.e. four different non-linguistic species-specific sounds. Data from these conditions will not be considered further in the present paper. Half of the human utterances of all three classes were produced by two female speakers and the other half by two male speakers.
Presentation
The experiment comprised of four conditions (Speech, Voice, Reverse, Artificial) and a visual control (Figure 3). Conditions were presented in randomized order (mixed trial design), and trial order differed for each subject. Within each trial of each condition, stimulation lasted 350 ms and was preceded by a variable gap (jittering) of either 830 ms, 1103 ms, 1376 ms, or 1649 ms. Stimulation was followed by a question mark presented after 2000 ms. This stimulus-onset-asynchrony was required in order to analyze brain activity without effects of the motor response. The question mark signalled the beginning of the response period which lasted 2000 ms at maximum; earlier responses aborted the question mark presentation. The next trial started after a variable interval (2350 to 3170 ms after the onset of the question mark, dependent on the jittering at the beginning of the trial). The trial-onset-asynchrony was 6 seconds. Except for the time during which the question mark was presented, the screen showed a fixation cross of 10 mm width and 10 mm height.
For all auditory conditions, sounds were presented as if coming from different locations in space. Five virtual sources were simulated by introducing different inter-aural time differences (ITDs): 0° (i.e. no ITD), 0.1 ms left (right) channel precedence (~10° deviance), or 0.2 ms left (right) channel precedence (~20° deviance). Within each auditory condition, the probability for a sound to be presented from 0° was 0.31; the probability for any other source was 0.167. In a visual control condition, a second cross was presented in addition to the original fixation cross 0.86° (sign centre) either to its right or to its left side. Stimulus presentation time was identical to that in the auditory conditions.
Three-hundred-twenty trials were presented overall, including 20 trials for the control condition and 20 empty trials (null-events). Among auditory conditions the number of trials differed such that hearing biological sounds (Speech, Voice, and Reverse) and non-biological sounds (Artificial) was equally probable, and, when hearing biological sounds, hearing linguistic (Speech) and non-linguistic (Voice, Reverse) sounds was equally probable, and when hearing non-linguistic sounds, hearing familiar/meaningful (Voice) and unfamiliar/meaningless (Reverse) sounds was equally probable. Accordingly, we presented 70 trials for Speech, 35 trials for Voice, 35 trials for Reverse, and 140 trials for Artificial.
Task Instructions
For the auditory conditions, participants were instructed to indicate whether they heard the stimulus coming rather from the right (pressing the right-hand button) or from the left (pressing the left-hand button). Participants were told that some stimuli were more difficult to locate than others, but they were naïve about the occurrence of stimuli coming from the 0° position. Participants were asked to deliver their response as fast as possible when the question mark appeared. For the visual control condition, participants were asked to indicate whether the second cross appeared left or right from the fixation cross by pressing the corresponding response button. Participants were instructed before the fMRI experiment.
Data Acquisition
In the MRI session, subjects were supine on the scanner bed with their right and left index finger positioned on the response buttons. In order to prevent postural adjustments, the subject's arms and hands were carefully stabilized by tape. In addition, form fitting cushions were used to prevent arm, hand and head motion. Participants were provided with earplugs (Bilsom model 303) to attenuate scanner noise (up to 41.8 dB). The auditory stimuli were presented via non-air tubes and through magnetic resonance-compatible electrostatic headphones ('Commander XG', Resonance Technology) which attenuates about 30 dB of gradient noise. Imaging was performed at 3T on a Bruker Medspec 30/100 system equipped with the standard bird cage head coil. Twelve axial slices (field of view 192 mm, 64 by 64 pixel matrix, thickness 3 mm, spacing 1 mm) parallel to the AC-PC plane were acquired using a single-shot gradient EPI sequence (TE = 30 ms, flip angle 90°, TR = 2000 ms) sensitive to BOLD contrast. Acquisition of the slices within the TR was arranged so that the slices were all rapidly acquired during 830 ms followed by a "silent" period of no acquisition (1170 ms) to complete the TR. This protocol, sometimes called bunched-early sequence, was used in order to present auditory stimuli without the gradient noise and so to enhance auditory perception. A set of 2D anatomical images was acquired for each subject immediately prior to the functional experiment, using a MDEFT sequence (256 × 256 pixel matrix). In a separate session, high resolution whole brain images were acquired from each subject to improve the localization of activation foci using a T1-weighted 3D segmented MDEFT sequence covering the whole brain.
Data Analysis
To assess whether responses to centrally presented stimuli differed in relation to the stimulus class, we calculated a repeated measures ANOVA of the proportion of right responses with stimulus as two-level within-subject factor (Speech, Artificial) and gender as between-subjects factor. The imaging data were processed using the software package LIPSIA [31]. This software package contains tools for pre-processing, co-registration, statistical evaluation, and visualization of fMRI data [see Additional file 4].