Skip to content

Advertisement

  • Poster presentation
  • Open Access

Learning speech recognition from songbirds

  • 1Email author,
  • 1 and
  • 1, 2
BMC Neuroscience201314 (Suppl 1) :P210

https://doi.org/10.1186/1471-2202-14-S1-P210

  • Published:

Keywords

  • Speech Recognition
  • Recurrent Neural Network
  • Recognition Model
  • Predictive Code
  • Computational Explanation

Our knowledge about the computational mechanisms underlying human learning and recognition of speech is still very limited [1]. One difficulty in deciphering the exact means by which humans recognize speech is that there are scarce experimental findings at a neuronal, microscopic level. Here, we show that our neuronal-computational understanding of speech learning and recognition may be vastly improved by looking at a different species, i.e., the songbird, which faces the same challenge as humans: to learn and decode complex auditory input partitioned into sequences of syllables, in an online fashion [2]. Motivated by striking similarities between the human and songbird neural recognition systems at the macroscopic level [3, 4], we assumed that the human brain uses the same computational principles at a microscopic level and translated a birdsong model [5] into a human speech learning and recognition model. The model performs a Bayesian version of dynamical, predictive coding [6] based on an internal generative model of how speech dynamics are produced. This generative model consists of a two-level hierarchy of recurrent neural networks similar to the song production hierarchy of songbirds [7]. In this predictive coding scheme, predictions about the future trajectory of the speech stimulus are dynamically formed based on a learned repertoire and the ongoing stimulus. The hierarchical inference uses top-down and bottom-up messages, which aim to minimize an error signal, the so-called prediction error.

We show that the resulting neurobiologically plausible model can learn words rapidly and recognize them robustly, even in adverse conditions. Also, the model is capable of dealing with variations in speech rate and competition by multiple speakers. In addition, we show that recognition can be performed even when words are spoken by different speakers and with different accents--an everyday situation in which current state-of-the-art speech recognition models often fail. We use the model to provide computational explanations for inter-individual differences in accent adaptation, as well as age of acquisition effects in second language learning. For the latter, we qualitatively modeled behavioral results from an experimental study [8].

Authors’ Affiliations

(1)
Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, 04103, Germany
(2)
Biomagnetic Center, Hans Berger Clinic for Neurology, University Hospital Jena, Friedrich-Schiller-University Jena, 07747, Germany

References

  1. Hickok G, Poeppel D: Opinion - The cortical organization of speech processing. Nat Rev Neurosci. 2007, 8 (5): 393-402. 10.1038/nrn2113.View ArticlePubMedGoogle Scholar
  2. Prather JF, Nowicki S, Anderson RC, Peters S, Mooney R: Neural correlates of categorical perception in learned vocal communication. Nat Neurosci. 2009, 12 (2): 221-228. 10.1038/nn.2246.PubMed CentralView ArticlePubMedGoogle Scholar
  3. Bolhuis JJ, Okanoya K, Scharff C: Twitter evolution: converging mechanisms in birdsong and human speech. Nat Rev Neurosci. 2010, 11 (11): 747-759.View ArticlePubMedGoogle Scholar
  4. Doupe AJ, Kuhl PK: Birdsong and human speech: Common themes and mechanisms. Annu Rev Neurosci. 1999, 22: 567-631. 10.1146/annurev.neuro.22.1.567.View ArticlePubMedGoogle Scholar
  5. Yildiz IB, Kiebel SJ: A Hierarchical Neuronal Model for Generation and Online Recognition of Birdsongs. Plos Comput Biol. 2011, 7 (12): e1002303-10.1371/journal.pcbi.1002303.PubMed CentralView ArticlePubMedGoogle Scholar
  6. Friston KJ, Trujillo-Barreto N, Daunizeau J: DEM: A variational treatment of dynamic systems. Neuroimage. 2008, 41 (3): 849-885. 10.1016/j.neuroimage.2008.02.054.View ArticlePubMedGoogle Scholar
  7. Fee MS, Kozhevnikov AA, Hahnloser RHR: Neural mechanisms of vocal sequence generation in the songbird. Annals of the New York Academy of Sciences. 2004, 1016: 153-170. 10.1196/annals.1298.022.View ArticlePubMedGoogle Scholar
  8. Meador D, Flege JE, Mackay IRA: Factors affecting the recognition of words in a second language. Bilingualism: Language and Cognition. 2000, 3: 55-67. 10.1017/S1366728900000134.View ArticleGoogle Scholar

Copyright

© Yildiz et al; licensee BioMed Central Ltd. 2013

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Advertisement